<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <article-id pub-id-type="doi">10.1145/3318464.3380595</article-id>
      <title-group>
        <article-title>Techniques in Accelerating Query Processing on GP U (Lightning Talk)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jimmy Lu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Meta Platforms, Inc. Corporate</institution>
          ,
          <addr-line>1 Hacker Way, Menlo Park, CA 94025</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <abstract>
        <p>Velox is a pioneer efort aimed at unifying query processing engines. This unification provides a compelling framework where hardware accelerators can be leveraged, and made available to any engines integrated with Velox. In this talk we present Velox Wave, a new framework for hardware accelerator built in Velox, and show early results illustrating the potential benefits of combining such a composable engine with GPU accelerators.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Query Processing</kwd>
        <kwd>Hardware Accelerator</kwd>
        <kwd>GPU</kwd>
        <kwd>CUDA</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Composability of the execution layer in data management
systems allows accelerators to be integrated in a single
library, like Velox, and leveraged by many engines. The
Velox Wave hardware acceleration framework proposes
a common interface for composing operators customized
to accelerators. GPU is one of the most promising and
universally available accelerators that can be used for
query processing. For this reason, we have conducted
experiments to investigate the potential benefits of GPU
accelerators in query processing. These experiments can
also serve as a proof-of-concept of how the Velox Wave
framework could look like in the future.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Experiments</title>
      <p>Previous work by A. Shanbhag et al. [1] has shown the
advantage as well as limitations of using GPU for query
processing. In this talk, we focus on components and
techniques that have not been covered in that work.</p>
      <p>The first experiment is a file reader to read Meta’s new
format Alpha, which caters to machine learning use cases.
The implementation includes GPU decoders for 8
diferent encodings and composition of them. Running the
benchmark on NVIDIA A100 cards shows the
throughput is very promising, ranging from 300 GB/s to 1000
GB/s for most of the encodings, showing a very large
advantage over CPU implementations.
The second experiment tests various hash table
optimization options. We tested hash tables with and
without tags, with and without partitioning, and ran it under
diferent table sizes, loading factors, and matching rates.
The results shows that for a fairly large table under heavy
workload, a partitioned GPU hash table is able to probe
6.8 billion rows per second, which is also very promising.</p>
      <p>Something we have not done but will try soon is to
experiment shufling on NVLink or other HPC
connections. Some early experiment using OpenUCX shows
that throughput of more than 80 GB/s can be achieved
between two GPU devices on A100. The number itself is
very positive, also we are amazed by the work done by
OpenUCX to abstract out the hardware details at
connection level, simplifying the implementation of exchange
operators.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Conclusion</title>
      <p>These initial experiments suggest that GPU acceleration
can be a viable architecture to accelerate query
processing, in a composable, general, and unified manner.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>