<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Comprehensive Framework for Sorting Benchmarks</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Sergey Madaminov Department of Computer Science Stony Brook University New Computer Science Building Stony Brook</institution>
          ,
          <addr-line>New York 11794-2424</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <abstract>
        <p>In the early days, sorting accounted for almost 25% of all cycles that computers were spending. That led to the development of a variety of sorting algorithms and their implementations, as well as the creation of sorting benchmarks. However, those benchmarks do not account well for increasing variability in the nature of data and they also fail to assess architectural features of di↵erent computer systems depending on the choice of the sorting algorithm. This work proposes the development of a comprehensive sorting benchmark framework to address those issues and to help with the evaluation of sorting algorithms from both software and hardware perspectives.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Sorting is an important operation that computers have
been performing from the early days [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. This led to the
development of various sorting algorithms. As it has proved to
be important at datacenter scale [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ] and it targeted
different scenarios and systems, various algorithms were
developed for general purpose sorting by using CPUs [
        <xref ref-type="bibr" rid="ref15 ref22 ref5">22, 15, 5</xref>
        ],
for sorting that is suitable for highly parallel systems [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and
for sorting using other types of architectures [
        <xref ref-type="bibr" rid="ref10 ref19">10, 19</xref>
        ].
However, with the rapid pace of increase in the scale of a sorting
problem, the question of which algorithm to choose remains
persistent. To answer this question, one needs to have a
sorting benchmark that is capable of providing enough
information for analyzing the needs and eciency of available
and proposed algorithms for a given purpose.
      </p>
      <p>
        The idea of having benchmarks is not novel and there is
a body of work done on the benchmarks for system
components such as CPU [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], applications such as databases [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ],
and systems for processing cloud workloads [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Some
existing studies have targeted sorting specifically [
        <xref ref-type="bibr" rid="ref13 ref14 ref21 ref7">13, 7, 21, 14</xref>
        ].
Generally stated, the di↵erent types of benchmarks cover
di↵erent parts of sorting systems from both architectural
perspectives as well as algorithmic and software
implementations.
      </p>
      <p>
        In spite of the rich body of knowledge on benchmarks,
drastic changes in computing today have made some of the
benchmarks obsolete. For instance, benchmarks such as
PennySort and TeraByte Sort are deprecated due to the
substantial growth in computational power that allows
handling much larger data sets [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Similarly, the nature of
the data itself may also di↵er and while there is a suggested
structure of a record to sort [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] that defines 100-byte records,
not all studies follow it [
        <xref ref-type="bibr" rid="ref20 ref23">20, 23</xref>
        ]. Moreover, sorting task itself
can vary a lot: it can be local to a single computer machine
or distributed among many nodes in a cluster, or it can
target di↵erent architectures.
      </p>
      <p>The variety of di↵erent factors makes it unnecessarily
complicated to evaluate sorting algorithms and sorting systems
and compare them against each other. Without a defined
structure of data record or defined distribution, it may
become non-trivial how to compare di↵erent sorting algorithms
or their implementations directly. It becomes even more
complicated when targeted systems are FPGAs as they may
be programmed to process a very specific set of data and
changes in the structure of the data may either significantly
a↵ect results or make it unfeasible to even process that data.</p>
      <p>To e↵ectively analyze the choice of a sorting algorithm
or sorting system, one needs to collect both hardware and
software statistics of any viable approach. While hardware
statistics may include cache performance, branch
misprediction, and TLB misses, the software statistics may include
running time on a particular system and scalability of the
sorting algorithm with the increasing number of available
parallelism or growth in the volume of data.</p>
      <p>To overcome the above issues, this work proposes
developing a comprehensive framework for sorting benchmarks
capable of evaluating various hardware and software aspects
of sorting algorithms and sorting systems while
maintaining ease of use. This work is structured as follows: Section
2 justifies the development of such a framework, Section 3
discusses framework architecture, and Section 4 concludes.
2.</p>
    </sec>
    <sec id="sec-2">
      <title>THE NEED FOR COMPREHENSIVE</title>
    </sec>
    <sec id="sec-3">
      <title>FRAMEWORK</title>
      <p>
        Multiple studies have been done on sorting benchmarks [
        <xref ref-type="bibr" rid="ref24 ref3 ref9">3,
24, 9</xref>
        ]. However, we advocate that there is a need for more
research on that topic. There are three main reasons for
that: first, existing benchmarks do not consider the
variety of input data and its distribution, second, they do not
assess hardware statistics, and, third, they are not a good
fit for a variety of di↵erent computer architectures. The
last one is particularly important as there is a number of
studies targeting various architectures such as GPUs [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ],
FPGAs [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], and AVX-based [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. But without a systematic
approach, the task of comparing them against each other
becomes quite challenging. This task of comparing
di↵erent architectures between themselves especially complicated
when only part of the sorting algorithm is implemented. For
example, some studies targeting FPGAs focus on the
merging [
        <xref ref-type="bibr" rid="ref20 ref23">20, 23</xref>
        ]. As such implementations may require data
transfer to and from the sorting system, some level of data
preparation, or may depend on the problem size, it is
unclear how to compare results obtained from di↵erent
architectures. Thus, the proposed framework should provide a
facility to perform a comparison between them. For similar
sorting algorithms, it can be achieved by direct
comparison of similar phases of the algorithms and estimating the
remaining phases, which may include potentially required
communication such as data transfer over the PCIe or
another medium.
      </p>
      <p>
        Many studies related to sorting use record structure
suggested by Datamation sorting benchmark [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], but it is not
universally accepted. Due to variations in record
structure, comparing the results of di↵erent studies directly is not
straightforward. On the other hand, the Datamation sorting
benchmark that defines the structure of a data record
being 100-byte with ten-byte key and ninety-byte value could
have become outdated. The current database vendors and
users should be surveyed to collect prevailing structures of
records and data distributions. However, as some works
may use the di↵erent input data, it is important to allow
variations in the input data. First, it will allow analyzing
studies that use di↵erent input data. Second, it will enable
the comparison with prior work.
      </p>
      <p>It deems important to understand how sorting algorithms
scale with an increasing number of parallelism or volume of
data, which requires collecting corresponding information.
To perform a more thorough evaluation of the sorting
algorithm it is crucial to collect systems statistics such as
memory bandwidth and caches miss rate. While it is possible
to use existing tools for profiling, it requires the algorithm
developer to install and learn a variety of tools. It can be
avoided by adding such functionality into the framework
itself. Some of the algorithms exhibit di↵erent behavior on
systems level, e.g., Quicksort algorithm is known for good
cache behavior and utilization. Gathering more information
can help to get a clear picture of the sorting algorithm, which
in turn can help to reason about the di↵erences between
different sorting algorithms. We suggest that the framework
should not just provide statistical data as feedback, but also
provide an analysis report that identifies weak points of the
algorithm and what potentially can be improved. Moreover,
modern benchmark systems are not easy to use. Thus, the
proposed framework should be user-friendly and should
provide reports for further analysis in a readable format.</p>
      <p>
        With a variety of studies on sorting including recent works
on exploring new computer architectures such as FPGAs [
        <xref ref-type="bibr" rid="ref19 ref20 ref23">19,
20, 23</xref>
        ] and GPUs [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and their suitability for sorting,
comparing their result becomes a challenging task. The
proposed framework will strive to address these challenges and
needs while maintaining ease of use. It may be still unclear
how to compare di↵erent computer architectures but this
work sets resolving this problem as one of its targets.
      </p>
    </sec>
    <sec id="sec-4">
      <title>FRAMEWORK ARCHITECTURE</title>
      <p>This section provides a brief overview of potential
framework components and argues for their need. It discusses
various aspects of proposed framework such as data
distribution, collectible system statistics, and some of the other
aspects that include record structure.
3.1</p>
    </sec>
    <sec id="sec-5">
      <title>Data Distribution</title>
      <p>
        The record generator used in the Sort Benchmark [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] can
produce two types of data distribution. Despite a bigger
variety of data being considered by Helman et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], their
work focuses on the structure of the data rather than its
distribution. Based on the nature of the data, it is possible to
have more options in the data distribution and the proposed
framework should account for both data structure and data
distribution. Often, these two features may be independent
of each other so the framework should provide facilities to
combine them together. Thus, it may become possible to
have both staggered data structure with Gaussian
distribution or any other combination of data structure and data
distribution. Table 1 provides the list of some of the
possible distributions to account for. However, similar to the
Datamation [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], a comprehensive list should be compiled
using input from the database vendors and the database users
to represent the actual workloads that may be found outside
of research groups.
3.2
      </p>
    </sec>
    <sec id="sec-6">
      <title>System Statistics</title>
      <p>
        Currently, to assess systems performance such as
memory bandwidth, the developer has to use tools such as Intel
VTune [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. While in some cases it may be inevitable to
use external software, the framework should collect
statistics where it can and at least provide the list of various
metrics to account for. Table 2 provides the list of some of
the suggested systems statistics to collect.
      </p>
      <p>
        Many modern sorting algorithms have optimal or
nearoptimal complexity, but real implementations may result in
noticeable di↵erences between them. Collecting such
statistics may help to identify bottlenecks that may lead to
further research on how those bottlenecks can be mitigated.
As a na¨ıve example, hugepages may help to reduce TLB
misses [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] and using recently introduced high-bandwidth
memories may help to handle the memory bandwidth bound
parts of the sorting algorithms. Moreover, identifying such
bottlenecks may steer hardware research. One can imagine
building a sorting specific accelerator to overcome them. For
example, it may be an FPGA that accelerates a particular
task or even a special-purpose processor that has an ISA
targeting the sorting task.
3.3
      </p>
    </sec>
    <sec id="sec-7">
      <title>Miscellaneous</title>
      <p>
        The data distribution and systems statistics cover many
di↵erent aspects of sorting but there are still some
implementation details and guidelines that may become useful for
algorithm developers. They include using custom
comparators, avoiding using indirect function calls [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], and di↵erent
record types with latter being tightly coupled with custom
comparators. Ultimately, evaluation of sorting algorithms
and sorting systems may have more factors to consider that
we have previously defined and it is deemed important to
identify them and leave the framework open to including
them.
4.
      </p>
    </sec>
    <sec id="sec-8">
      <title>CONCLUSIONS</title>
      <p>This work advocates for the development of a
comprehensive framework for sorting benchmarks, which accounts
for various aspects of sorting algorithms starting with
defining the input data and measures both their software and
hardware statistics. Such a framework may help to create
a system to foster the development of sorting algorithms as
well as designing new computer architectures for sorting. We
envision that it will be beneficial for many communities
outside of a group of scientists who work on the development
of new sorting algorithms or modifying the existing ones.
While the work in its preliminary stage, there are many
design choices that have to be done and collecting feedback
from database vendors and users is essential for what are
the common data features and hardware statistics they do
care.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>V.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dabral</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Palit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Ferdman</surname>
          </string-name>
          .
          <article-title>Architectural Support for Dynamic Linking</article-title>
          .
          <source>In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems</source>
          , pages
          <fpage>691</fpage>
          -
          <lpage>702</lpage>
          , New York, NY, USA,
          <year>2015</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Axtmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Bingmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sanders</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Schulz</surname>
          </string-name>
          .
          <article-title>Practical Massively Parallel Sorting</article-title>
          .
          <source>In Proceedings of the 27th ACM on symposium on Parallelism in Algorithms and Architectures</source>
          , pages
          <fpage>13</fpage>
          -
          <lpage>23</lpage>
          . ACM Press,
          <year>June 2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D. H.</given-names>
            <surname>Bailey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Barszcz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. T.</given-names>
            <surname>Barton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Browning</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. L.</given-names>
            <surname>Carter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Dagum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Fatoohi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. O.</given-names>
            <surname>Frederickson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. A.</given-names>
            <surname>Lasinski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Schreiber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. D.</given-names>
            <surname>Simon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Venkatakrishnan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Weeratunga</surname>
          </string-name>
          .
          <article-title>The NAS Parallel Benchmarks - Summary and Preliminary Results</article-title>
          .
          <source>In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing</source>
          , pages
          <fpage>158</fpage>
          -
          <lpage>165</lpage>
          , New York, NY, USA,
          <year>1991</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Bramas</surname>
          </string-name>
          .
          <article-title>A Novel Hybrid Quicksort Algorithm Vectorized using AVX-512 on Intel Skylake</article-title>
          .
          <source>International Journal of Advanced Computer Science and Applications</source>
          ,
          <volume>8</volume>
          (
          <issue>10</issue>
          ),
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Bron</surname>
          </string-name>
          . Merge Sort Algorithm [M1].
          <source>Communications of the ACM</source>
          ,
          <volume>15</volume>
          (
          <issue>5</issue>
          ):
          <fpage>357</fpage>
          -
          <lpage>358</lpage>
          , May
          <year>1972</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bucek</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.-D. Lange</surname>
            , and
            <given-names>J. v. Kistowski. SPEC</given-names>
          </string-name>
          <article-title>CPU2017: Next-Generation Compute Benchmark</article-title>
          .
          <source>In Companion of the 2018 ACM/SPEC International Conference on Performance Engineering</source>
          , pages
          <fpage>41</fpage>
          -
          <lpage>42</lpage>
          , New York, NY, USA,
          <year>2018</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>A.</surname>
          </string-name>
          et al, D. Bitton,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brown</surname>
          </string-name>
          , R. Catell,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ceri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>DeWitt</surname>
          </string-name>
          , D. Gawlick,
          <string-name>
            <given-names>H.</given-names>
            <surname>Garcia-Molina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Good</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Homan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Jolls</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lukes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Lazowska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Nauman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Spector</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Trieber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sammer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Serlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stonebraker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Reuter</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Weinberger</surname>
          </string-name>
          .
          <source>A Measure of Transaction Processing Power. Datamation</source>
          ,
          <volume>31</volume>
          (
          <issue>7</issue>
          ):
          <fpage>112</fpage>
          -
          <lpage>118</lpage>
          ,
          <year>April 1985</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ferdman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Adileh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Kocberber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Volos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Alisafaee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jevdjic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kaynak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. D.</given-names>
            <surname>Popescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ailamaki</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Falsafi</surname>
          </string-name>
          .
          <article-title>Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware</article-title>
          .
          <source>Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems</source>
          , pages
          <fpage>37</fpage>
          -
          <lpage>48</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Garcia</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Korth</surname>
          </string-name>
          .
          <article-title>Multithreaded Architectures and the Sort Benchmark</article-title>
          .
          <source>In Proceedings of the 1st International Workshop on Data Management on New Hardware</source>
          , New York, NY, USA,
          <year>2005</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>N.</given-names>
            <surname>Govindaraju</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kumar</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Manocha</surname>
          </string-name>
          .
          <article-title>GPUTeraSort: High Performance Graphics Co-processor Sorting for Large Database Management</article-title>
          .
          <source>In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data</source>
          , pages
          <fpage>325</fpage>
          -
          <lpage>336</lpage>
          , New York, NY, USA,
          <year>2006</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Graefe. Sorting And Indexing With Partitioned B-Trees</surname>
          </string-name>
          .
          <source>In In Proceedings of the 1st International Conference on Innovative Data Systems Research</source>
          , Asilomar, CA, USA,
          <year>January 2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>G.</given-names>
            <surname>Graefe</surname>
          </string-name>
          .
          <source>Implementing Sorting in Database Systems. ACM Computing Surveys</source>
          ,
          <volume>38</volume>
          (
          <issue>3</issue>
          ),
          <year>September 2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Nyberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Shah</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Govindaraju</surname>
          </string-name>
          . Sorting Benchmark. http://sortbenchmark.org/.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Helman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Bader</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Ja´J´</surname>
          </string-name>
          <article-title>a. Parallel Algorithms for Personalized Communication and Sorting With an Experimental Study (Extended Abstract)</article-title>
          .
          <source>In Proceedings of the eighth annual ACM symposium on Parallel Algorithms and Architectures</source>
          , pages
          <fpage>211</fpage>
          -
          <lpage>222</lpage>
          . ACM Press,
          <year>June 1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>C. A. R.</given-names>
            <surname>Hoare</surname>
          </string-name>
          . Quicksort. The
          <source>Computer Journal</source>
          ,
          <volume>5</volume>
          (
          <issue>1</issue>
          ):
          <fpage>10</fpage>
          -
          <lpage>16</lpage>
          ,
          <year>January 1962</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          . HUB:
          <article-title>Hugepage Ballooning in Kernel-based Virtual Machines</article-title>
          .
          <source>In Proceedings of the International Symposium on Memory Systems</source>
          , pages
          <fpage>31</fpage>
          -
          <lpage>37</lpage>
          , New York, NY, USA,
          <year>2018</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Intel</surname>
          </string-name>
          . Intel VTune. https://software.intel.com/en-us/vtune.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Knuth</surname>
          </string-name>
          .
          <source>The Art of Computer Programming: Sorting and Searching</source>
          , volume
          <volume>3</volume>
          .
          <string-name>
            <surname>Addison-Wesley</surname>
            <given-names>Professional</given-names>
          </string-name>
          ,
          <source>2nd edition</source>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>D.</given-names>
            <surname>Koch</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Torresen</surname>
          </string-name>
          . FPGASort.
          <source>In Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays</source>
          , pages
          <fpage>45</fpage>
          -
          <lpage>54</lpage>
          . ACM Press,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>S.</given-names>
            <surname>Mashimo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. V.</given-names>
            <surname>Chu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Kise</surname>
          </string-name>
          .
          <article-title>High-Performance Hardware Merge Sorter</article-title>
          .
          <source>In 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . IEEE,
          <year>April 2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>C.</given-names>
            <surname>Nyberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Barclay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Cvetanovic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gray</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Lomet</surname>
          </string-name>
          .
          <article-title>AlphaSort: A Cache-sensitive Parallel External Sort</article-title>
          .
          <source>The VLDB Journal</source>
          ,
          <volume>4</volume>
          (
          <issue>4</issue>
          ):
          <fpage>603</fpage>
          -
          <lpage>628</lpage>
          ,
          <year>October 1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>T.</given-names>
            <surname>Peters</surname>
          </string-name>
          . Timsort. https://bugs.python.org/file4451/timsort.txt.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>M.</given-names>
            <surname>Saitoh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. A.</given-names>
            <surname>Elsayed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. V.</given-names>
            <surname>Chu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mashimo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Kise</surname>
          </string-name>
          .
          <article-title>A High-Performance and Cost-E↵ective Hardware Merge Sorter without Feedback Datapath</article-title>
          .
          <source>In 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)</source>
          , pages
          <fpage>197</fpage>
          -
          <lpage>204</lpage>
          . IEEE,
          <year>April 2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>K.</given-names>
            <surname>Thearling</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Smith</surname>
          </string-name>
          .
          <article-title>An Improved Supercomputer Sorting Benchmark</article-title>
          .
          <source>In Proceedings of the 1992 ACM/IEEE Conference on Supercomputing</source>
          , pages
          <fpage>14</fpage>
          -
          <lpage>19</lpage>
          , Los Alamitos, CA, USA,
          <year>1992</year>
          . IEEE Computer Society Press.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>TPC</surname>
          </string-name>
          .
          <article-title>Active TPC Benchmarks</article-title>
          . http://www.tpc.org/information/benchmarks.asp.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>