<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Comprehensive Framework for Sorting Benchmarks</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Sergey</forename><surname>Madaminov</surname></persName>
							<email>smadaminov@cs.stonybrook.edu</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science Stony Brook</orgName>
								<orgName type="institution">University New Computer Science Building Stony Brook</orgName>
								<address>
									<postCode>11794-2424</postCode>
									<region>New York</region>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Michael</forename><surname>Ferdman</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science Stony Brook</orgName>
								<orgName type="institution">University New Computer Science Building Stony Brook</orgName>
								<address>
									<postCode>11794-2424</postCode>
									<region>New York</region>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Comprehensive Framework for Sorting Benchmarks</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">770017D19A1A15B49BE92B340907AE7E</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T08:34+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In the early days, sorting accounted for almost 25% of all cycles that computers were spending. That led to the development of a variety of sorting algorithms and their implementations, as well as the creation of sorting benchmarks. However, those benchmarks do not account well for increasing variability in the nature of data and they also fail to assess architectural features of di↵erent computer systems depending on the choice of the sorting algorithm. This work proposes the development of a comprehensive sorting benchmark framework to address those issues and to help with the evaluation of sorting algorithms from both software and hardware perspectives.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">INTRODUCTION</head><p>Sorting is an important operation that computers have been performing from the early days <ref type="bibr" target="#b17">[18]</ref>. This led to the development of various sorting algorithms. As it has proved to be important at datacenter scale <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b11">12]</ref> and it targeted different scenarios and systems, various algorithms were developed for general purpose sorting by using CPUs <ref type="bibr" target="#b21">[22,</ref><ref type="bibr" target="#b14">15,</ref><ref type="bibr" target="#b4">5]</ref>, for sorting that is suitable for highly parallel systems <ref type="bibr" target="#b1">[2]</ref>, and for sorting using other types of architectures <ref type="bibr" target="#b9">[10,</ref><ref type="bibr" target="#b18">19]</ref>. However, with the rapid pace of increase in the scale of a sorting problem, the question of which algorithm to choose remains persistent. To answer this question, one needs to have a sorting benchmark that is capable of providing enough information for analyzing the needs and e ciency of available and proposed algorithms for a given purpose.</p><p>The idea of having benchmarks is not novel and there is a body of work done on the benchmarks for system components such as CPU <ref type="bibr" target="#b5">[6]</ref>, applications such as databases <ref type="bibr" target="#b24">[25]</ref>, and systems for processing cloud workloads <ref type="bibr" target="#b7">[8]</ref>. Some existing studies have targeted sorting specifically <ref type="bibr" target="#b12">[13,</ref><ref type="bibr" target="#b6">7,</ref><ref type="bibr" target="#b20">21,</ref><ref type="bibr" target="#b13">14]</ref>. Generally stated, the di↵erent types of benchmarks cover di↵erent parts of sorting systems from both architectural perspectives as well as algorithmic and software implementations.</p><p>In spite of the rich body of knowledge on benchmarks, drastic changes in computing today have made some of the benchmarks obsolete. For instance, benchmarks such as PennySort and TeraByte Sort are deprecated due to the substantial growth in computational power that allows handling much larger data sets <ref type="bibr" target="#b12">[13]</ref>. Similarly, the nature of the data itself may also di↵er and while there is a suggested structure of a record to sort <ref type="bibr" target="#b6">[7]</ref> that defines 100-byte records, not all studies follow it <ref type="bibr" target="#b19">[20,</ref><ref type="bibr" target="#b22">23]</ref>. Moreover, sorting task itself can vary a lot: it can be local to a single computer machine or distributed among many nodes in a cluster, or it can target di↵erent architectures.</p><p>The variety of di↵erent factors makes it unnecessarily complicated to evaluate sorting algorithms and sorting systems and compare them against each other. Without a defined structure of data record or defined distribution, it may become non-trivial how to compare di↵erent sorting algorithms or their implementations directly. It becomes even more complicated when targeted systems are FPGAs as they may be programmed to process a very specific set of data and changes in the structure of the data may either significantly a↵ect results or make it unfeasible to even process that data.</p><p>To e↵ectively analyze the choice of a sorting algorithm or sorting system, one needs to collect both hardware and software statistics of any viable approach. While hardware statistics may include cache performance, branch misprediction, and TLB misses, the software statistics may include running time on a particular system and scalability of the sorting algorithm with the increasing number of available parallelism or growth in the volume of data.</p><p>To overcome the above issues, this work proposes developing a comprehensive framework for sorting benchmarks capable of evaluating various hardware and software aspects of sorting algorithms and sorting systems while maintaining ease of use. This work is structured as follows: Section 2 justifies the development of such a framework, Section 3 discusses framework architecture, and Section 4 concludes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">THE NEED FOR COMPREHENSIVE FRAMEWORK</head><p>last one is particularly important as there is a number of studies targeting various architectures such as GPUs <ref type="bibr" target="#b9">[10]</ref>, FPGAs <ref type="bibr" target="#b18">[19]</ref>, and AVX-based <ref type="bibr" target="#b3">[4]</ref>. But without a systematic approach, the task of comparing them against each other becomes quite challenging. This task of comparing di↵erent architectures between themselves especially complicated when only part of the sorting algorithm is implemented. For example, some studies targeting FPGAs focus on the merging <ref type="bibr" target="#b19">[20,</ref><ref type="bibr" target="#b22">23]</ref>. As such implementations may require data transfer to and from the sorting system, some level of data preparation, or may depend on the problem size, it is unclear how to compare results obtained from di↵erent architectures. Thus, the proposed framework should provide a facility to perform a comparison between them. For similar sorting algorithms, it can be achieved by direct comparison of similar phases of the algorithms and estimating the remaining phases, which may include potentially required communication such as data transfer over the PCIe or another medium.</p><p>Many studies related to sorting use record structure suggested by Datamation sorting benchmark <ref type="bibr" target="#b6">[7]</ref>, but it is not universally accepted. Due to variations in record structure, comparing the results of di↵erent studies directly is not straightforward. On the other hand, the Datamation sorting benchmark that defines the structure of a data record being 100-byte with ten-byte key and ninety-byte value could have become outdated. The current database vendors and users should be surveyed to collect prevailing structures of records and data distributions. However, as some works may use the di↵erent input data, it is important to allow variations in the input data. First, it will allow analyzing studies that use di↵erent input data. Second, it will enable the comparison with prior work.</p><p>It deems important to understand how sorting algorithms scale with an increasing number of parallelism or volume of data, which requires collecting corresponding information. To perform a more thorough evaluation of the sorting algorithm it is crucial to collect systems statistics such as memory bandwidth and caches miss rate. While it is possible to use existing tools for profiling, it requires the algorithm developer to install and learn a variety of tools. It can be avoided by adding such functionality into the framework itself. Some of the algorithms exhibit di↵erent behavior on systems level, e.g., Quicksort algorithm is known for good cache behavior and utilization. Gathering more information can help to get a clear picture of the sorting algorithm, which in turn can help to reason about the di↵erences between different sorting algorithms. We suggest that the framework should not just provide statistical data as feedback, but also provide an analysis report that identifies weak points of the algorithm and what potentially can be improved. Moreover, modern benchmark systems are not easy to use. Thus, the proposed framework should be user-friendly and should provide reports for further analysis in a readable format.</p><p>With a variety of studies on sorting including recent works on exploring new computer architectures such as FPGAs <ref type="bibr" target="#b18">[19,</ref><ref type="bibr" target="#b19">20,</ref><ref type="bibr" target="#b22">23]</ref> and GPUs <ref type="bibr" target="#b9">[10]</ref> and their suitability for sorting, comparing their result becomes a challenging task. The proposed framework will strive to address these challenges and needs while maintaining ease of use. It may be still unclear how to compare di↵erent computer architectures but this work sets resolving this problem as one of its targets. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">FRAMEWORK ARCHITECTURE</head><p>This section provides a brief overview of potential framework components and argues for their need. It discusses various aspects of proposed framework such as data distribution, collectible system statistics, and some of the other aspects that include record structure.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Data Distribution</head><p>The record generator used in the Sort Benchmark <ref type="bibr" target="#b12">[13]</ref> can produce two types of data distribution. Despite a bigger variety of data being considered by Helman et al. <ref type="bibr" target="#b13">[14]</ref>, their work focuses on the structure of the data rather than its distribution. Based on the nature of the data, it is possible to have more options in the data distribution and the proposed framework should account for both data structure and data distribution. Often, these two features may be independent of each other so the framework should provide facilities to combine them together. Thus, it may become possible to have both staggered data structure with Gaussian distribution or any other combination of data structure and data distribution. Table <ref type="table" target="#tab_0">1</ref> provides the list of some of the possible distributions to account for. However, similar to the Datamation <ref type="bibr" target="#b6">[7]</ref>, a comprehensive list should be compiled using input from the database vendors and the database users to represent the actual workloads that may be found outside of research groups.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">System Statistics</head><p>Currently, to assess systems performance such as memory bandwidth, the developer has to use tools such as Intel VTune <ref type="bibr" target="#b16">[17]</ref>. While in some cases it may be inevitable to use external software, the framework should collect statistics where it can and at least provide the list of various metrics to account for. Table <ref type="table" target="#tab_1">2</ref> provides the list of some of the suggested systems statistics to collect.</p><p>Many modern sorting algorithms have optimal or nearoptimal complexity, but real implementations may result in noticeable di↵erences between them. Collecting such statistics may help to identify bottlenecks that may lead to further research on how those bottlenecks can be mitigated. As a naïve example, hugepages may help to reduce TLB misses <ref type="bibr" target="#b15">[16]</ref> and using recently introduced high-bandwidth memories may help to handle the memory bandwidth bound parts of the sorting algorithms. Moreover, identifying such bottlenecks may steer hardware research. One can imagine building a sorting specific accelerator to overcome them. For example, it may be an FPGA that accelerates a particular task or even a special-purpose processor that has an ISA targeting the sorting task.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Miscellaneous</head><p>The data distribution and systems statistics cover many di↵erent aspects of sorting but there are still some implementation details and guidelines that may become useful for </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">CONCLUSIONS</head><p>This work advocates for the development of a comprehensive framework for sorting benchmarks, which accounts for various aspects of sorting algorithms starting with defining the input data and measures both their software and hardware statistics. Such a framework may help to create a system to foster the development of sorting algorithms as well as designing new computer architectures for sorting. We envision that it will be beneficial for many communities outside of a group of scientists who work on the development of new sorting algorithms or modifying the existing ones. While the work in its preliminary stage, there are many design choices that have to be done and collecting feedback from database vendors and users is essential for what are the common data features and hardware statistics they do care.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Example of the data distribution.</figDesc><table><row><cell>Uniform</cell><cell>Bernoulli</cell></row><row><cell>Poisson</cell><cell>Exponential</cell></row><row><cell cols="2">Gaussian Log Normal</cell></row><row><cell>Gamma</cell><cell>Beta</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>Example of the collectible system statistics.</figDesc><table><row><cell>I/O Intensity</cell><cell>TLB Miss Rate</cell></row><row><cell>IPC Intensity</cell><cell>Caches Miss Rate</cell></row><row><cell>Cache Utilization</cell><cell>Branch Misprediction</cell></row><row><cell>Memory Bandwidth</cell><cell>Memory Peak B/W</cell></row><row><cell cols="2">algorithm developers. They include using custom compara-</cell></row><row><cell cols="2">tors, avoiding using indirect function calls [1], and di↵erent</cell></row><row><cell cols="2">record types with latter being tightly coupled with custom</cell></row><row><cell cols="2">comparators. Ultimately, evaluation of sorting algorithms</cell></row><row><cell cols="2">and sorting systems may have more factors to consider that</cell></row><row><cell cols="2">we have previously defined and it is deemed important to</cell></row><row><cell cols="2">identify them and leave the framework open to including</cell></row><row><cell>them.</cell><cell></cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Architectural Support for Dynamic Linking</title>
		<author>
			<persName><forename type="first">V</forename><surname>Agrawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Dabral</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Palit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ferdman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems</title>
				<meeting>the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="691" to="702" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Practical Massively Parallel Sorting</title>
		<author>
			<persName><forename type="first">M</forename><surname>Axtmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Bingmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Sanders</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Schulz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 27th ACM on symposium on Parallelism in Algorithms and Architectures</title>
				<meeting>the 27th ACM on symposium on Parallelism in Algorithms and Architectures</meeting>
		<imprint>
			<publisher>ACM Press</publisher>
			<date type="published" when="2015-06">June 2015</date>
			<biblScope unit="page" from="13" to="23" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">The NAS Parallel Benchmarks -Summary and Preliminary Results</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">H</forename><surname>Bailey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Barszcz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">T</forename><surname>Barton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">S</forename><surname>Browning</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">L</forename><surname>Carter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Dagum</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">A</forename><surname>Fatoohi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">O</forename><surname>Frederickson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">A</forename><surname>Lasinski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">S</forename><surname>Schreiber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">D</forename><surname>Simon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Venkatakrishnan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">K</forename><surname>Weeratunga</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1991 ACM/IEEE Conference on Supercomputing</title>
				<meeting>the 1991 ACM/IEEE Conference on Supercomputing<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="1991">1991</date>
			<biblScope unit="page" from="158" to="165" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">A Novel Hybrid Quicksort Algorithm Vectorized using AVX-512 on Intel Skylake</title>
		<author>
			<persName><forename type="first">B</forename><surname>Bramas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Journal of Advanced Computer Science and Applications</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="issue">10</biblScope>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Merge Sort Algorithm [M1</title>
		<author>
			<persName><forename type="first">C</forename><surname>Bron</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Communications of the ACM</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="357" to="358" />
			<date type="published" when="1972-05">May 1972</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">SPEC CPU2017: Next-Generation Compute Benchmark</title>
		<author>
			<persName><forename type="first">J</forename><surname>Bucek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K.-D</forename><surname>Lange</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">V</forename><surname>Kistowski</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Companion of the 2018 ACM/SPEC International Conference on Performance Engineering</title>
				<meeting><address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="41" to="42" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">A Measure of Transaction Processing Power</title>
		<author>
			<persName><forename type="first">A</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Datamation</title>
		<imprint>
			<biblScope unit="volume">31</biblScope>
			<biblScope unit="issue">7</biblScope>
			<biblScope unit="page" from="112" to="118" />
			<date type="published" when="1985-04">April 1985</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware</title>
		<author>
			<persName><forename type="first">M</forename><surname>Ferdman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Adileh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Kocberber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Volos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Alisafaee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Jevdjic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Kaynak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">D</forename><surname>Popescu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ailamaki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Falsafi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems</title>
				<meeting>the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems</meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="37" to="48" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Multithreaded Architectures and the Sort Benchmark</title>
		<author>
			<persName><forename type="first">P</forename><surname>Garcia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Korth</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1st International Workshop on Data Management on New Hardware</title>
				<meeting>the 1st International Workshop on Data Management on New Hardware<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">GPUTeraSort: High Performance Graphics Co-processor Sorting for Large Database Management</title>
		<author>
			<persName><forename type="first">N</forename><surname>Govindaraju</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gray</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Manocha</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data</title>
				<meeting>the 2006 ACM SIGMOD International Conference on Management of Data<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="325" to="336" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Sorting And Indexing With Partitioned B-Trees</title>
		<author>
			<persName><forename type="first">G</forename><surname>Graefe</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1st International Conference on Innovative Data Systems Research</title>
				<meeting>the 1st International Conference on Innovative Data Systems Research<address><addrLine>Asilomar, CA, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2003-01">January 2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Implementing Sorting in Database Systems</title>
		<author>
			<persName><forename type="first">G</forename><surname>Graefe</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Computing Surveys</title>
		<imprint>
			<biblScope unit="volume">38</biblScope>
			<biblScope unit="issue">3</biblScope>
			<date type="published" when="2006-09">September 2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Gray</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Nyberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Shah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Govindaraju</surname></persName>
		</author>
		<ptr target="http://sortbenchmark.org/" />
		<title level="m">Sorting Benchmark</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Parallel Algorithms for Personalized Communication and Sorting With an Experimental Study (Extended Abstract)</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">R</forename><surname>Helman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">A</forename><surname>Bader</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Jájá</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the eighth annual ACM symposium on Parallel Algorithms and Architectures</title>
				<meeting>the eighth annual ACM symposium on Parallel Algorithms and Architectures</meeting>
		<imprint>
			<publisher>ACM Press</publisher>
			<date type="published" when="1996-06">June 1996</date>
			<biblScope unit="page" from="211" to="222" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Quicksort</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">A R</forename><surname>Hoare</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The Computer Journal</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="10" to="16" />
			<date type="published" when="1962-01">January 1962</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">HUB: Hugepage Ballooning in Kernel-based Virtual Machines</title>
		<author>
			<persName><forename type="first">J</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Bai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Luo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Symposium on Memory Systems</title>
				<meeting>the International Symposium on Memory Systems<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="31" to="37" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<ptr target="https://software.intel.com/en-us/vtune" />
		<title level="m">Intel VTune</title>
				<imprint/>
		<respStmt>
			<orgName>Intel</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<title level="m" type="main">The Art of Computer Programming: Sorting and Searching</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">E</forename><surname>Knuth</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1998">1998</date>
			<publisher>Addison-Wesley Professional</publisher>
			<biblScope unit="volume">3</biblScope>
		</imprint>
	</monogr>
	<note>2nd edition</note>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">FPGASort</title>
		<author>
			<persName><forename type="first">D</forename><surname>Koch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Torresen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays</title>
				<meeting>the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays</meeting>
		<imprint>
			<publisher>ACM Press</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="45" to="54" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">High-Performance Hardware Merge Sorter</title>
		<author>
			<persName><forename type="first">S</forename><surname>Mashimo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">V</forename><surname>Chu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kise</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2017-04">2017. April 2017</date>
			<biblScope unit="page" from="1" to="8" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">AlphaSort: A Cache-sensitive Parallel External Sort</title>
		<author>
			<persName><forename type="first">C</forename><surname>Nyberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Barclay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Cvetanovic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gray</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lomet</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The VLDB Journal</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="603" to="628" />
			<date type="published" when="1995-10">October 1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><surname>Peters</surname></persName>
		</author>
		<ptr target="https://bugs.python.org/file4451/timsort.txt" />
		<title level="m">Timsort</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">A High-Performance and Cost-E↵ective Hardware Merge Sorter without Feedback Datapath</title>
		<author>
			<persName><forename type="first">M</forename><surname>Saitoh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">A</forename><surname>Elsayed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">V</forename><surname>Chu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mashimo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kise</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2018-04">2018. April 2018</date>
			<biblScope unit="page" from="197" to="204" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">An Improved Supercomputer Sorting Benchmark</title>
		<author>
			<persName><forename type="first">K</forename><surname>Thearling</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Smith</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1992 ACM/IEEE Conference on Supercomputing</title>
				<meeting>the 1992 ACM/IEEE Conference on Supercomputing<address><addrLine>Los Alamitos, CA, USA</addrLine></address></meeting>
		<imprint>
			<publisher>IEEE Computer Society Press</publisher>
			<date type="published" when="1992">1992</date>
			<biblScope unit="page" from="14" to="19" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<ptr target="http://www.tpc.org/information/benchmarks.asp" />
		<title level="m">Active TPC Benchmarks</title>
				<imprint/>
		<respStmt>
			<orgName>TPC</orgName>
		</respStmt>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
