<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>April</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Demo of VisBooster: Accelerating Tableau Live Mode Queries Up to 100 Times Faster</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Qiushi Bai</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sadeem Alsudais</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chen Li</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of California</institution>
          ,
          <addr-line>Irvine, CA 92697</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>1</volume>
      <issue>2022</issue>
      <fpage>0000</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>We propose a middleware-based query rewriting framework called VisBooster to accelerate visualization queries formulated by Tableau in its live mode. VisBooster intercepts SQL queries by customizing JDBC drivers used by Tableau and uses rules to rewrite the queries to semantically equivalent yet more eficient queries. The rewriting rules are designed by data experts who analyze slow queries and apply their domain knowledge and optimization expertise. We demonstrate that VisBooster can accelerate visualization queries formulated by Tableau up to 100 times faster.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;tableau</kwd>
        <kwd>visualization</kwd>
        <kwd>middleware</kwd>
        <kwd>query rewrite</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Recently, various techniques have been proposed to
accelerate visualization queries.</p>
      <p>Pre-computation-based approaches. Datacube
techniques [7, 8] that predefine cube intervals cannot support
visualization queries with arbitrary numerical range
conditions, while VisBooster does not have such restrictions (5) Result (4) Result
on query shapes. View-based techniques such as [9, 10] JDBC
that utilize materialized results to accelerate queries re- (1) Query Q Driver (3Q)RueerwyriQtte'n Database
quire additional storage overhead. Prefetching-based
teercahtenivqiusuesalsiuzcahtioans [q1u1e,r1i2e]s.uVtiilsizBeocoasctheerdisreosruthltosgtoonaaclcetol- (EUmsielyr) (2) Rewrite (Q6u)eSrlioews
these techniques, and it can be used together with them Query Rewriter (7) Rewriting
to further improve query performance. Rules</p>
      <p>Query rewriting-based approaches. Bao [13] is a re- Rewriting Rules
cent technique that learns to choose a good hint for a Tableau Live Mode VisBooster DB Expert (Bob)
query to accelerate query execution. The proposed
VisBooster framework is not limited to any specific type Figure 1: Overview of the VisBooster framework.
of rewriting rules. For rewriting rules that involve only
query hints, VisBooster can also use Bao as part of the With VisBooster, after step 1, the modified JDBC driver
rewriting process. In addition, database vendors provide invokes the Query Rewriter to rewrite  to a
semantiquery rewriting functionalities (e.g., PostgreSQL [14] and cally equivalent but more eficient SQL query, denoted as
MySQL [15]). These functionalities allow users to define ′ (step 2). The new query ′ is sent to the database for
their own rewriting rules, and the database will automat- execution (step 3). A challenge of the rule-based query
ically use these rules to rewrite queries. However, these rewriter is that for a given query , it needs to decide
functionalities only support cases where a materialized what rewriting rules should be applied to generate a new
view can be used, while the proposed VisBooster can query ′. In the VisBooster framework, a database
exsupport more rewriting cases where a more eficient plan pert analyzes the log of slow queries from the backend
is desired. database (step 6). By using the knowledge about the</p>
      <p>Compared to these existing techniques, VisBooster has database, the expert identifies rewriting rules and adds
the following uniqueness: them to the query rewriter (step 7). With the
rewriting rules applied to new visualization queries and more
• It adopts a human-in-the-loop approach that fully slow queries identified by the expert, the framework can
leverages human intelligence, including domain- iteratively improve the performance of queries.
specific knowledge and database-optimization
expertise.
• It treats both the application (Tableau) and the 4. Demonstration Scenarios
backend database as black boxes and requires no
code modification. It can significantly improve In this section, we present several scenarios to
demonthe user experience in the Tableau live mode with- strate the eficacy of the VisBooster framework for
variout changing the existing software and services. ous visualization queries on diferent databases, including
• It is a general framework that supports many PostgreSQL and MySQL.</p>
      <p>types of rewriting rules such as predicate
transformation, query hints, and etc.</p>
      <p>4.1. Case 1: Accelerating choropleth map
queries on PostgreSQL</p>
    </sec>
    <sec id="sec-2">
      <title>3. VisBooster Overview</title>
      <p>In case 1, suppose Emily is a data analyst who wants to
study the temporal and spatial distributions of social
media discussions about iPhone using a table of 30 million
tweets. She builds a dashboard in Tableau connected to
PostgreSQL in the live mode.</p>
      <p>The dashboard in Figure 2 consists of three modules.</p>
      <p>On the top is a textual filter that allows users to input a
keyword and filters tweets containing the keyword as a
substring in their texts. A state-level choropleth map in
the middle shows a geo spatial distribution of the filtered
tweets grouped by states. A quarter-level bar chart at
the bottom shows a temporal distribution of the filtered
tweets grouped by quarters. Both the choropleth map
and the bar chart change as the user further explores
the results. For example, after the user types in iphone
sulting in a long execution time of 32 seconds, as shown
in Figure 5(a). The long delay of the dashboard queries
significantly impairs Emily’s productivity, so she asks a
database expert, Bob, for help.</p>
      <p>After analyzing the original query sent to PostgreSQL
and the physical plan, Bob realizes that those type-casting
expressions (e.g., CAST(“created_at” AS DATE))
prevent PostgreSQL from choosing a more eficient
indexscan plan. The reason why Tableau adds those
typecasting expressions is to prevent computational overflow
errors [16]. However, in this case, with the knowledge
about the underlying data, Bob can remove those
typecasting expressions without worrying about such failures.</p>
      <p>Thus, he introduces the following rewriting rule:</p>
      <p>Rule-1: CAST( ⟨exp⟩ AS ⟨type⟩ ) ⇒ ⟨exp⟩ .</p>
      <p>After applying this rule on 1, we have the new query
′1 in Figure 4.
in the input box, the bar chart’s first three quarters in SELECT SUM(1) AS "cnt:tweets",
2017 attract Emily’s attention due to their higher number CAST("state_name" AS TEXT) AS "state_name"
of results than other quarters. She selects those quar- FROM "tweets"
ters in the bar chart to zoom into the tweets published WHERE DATE_TRUNC('QUARTER', "created_at") IN (
within the time range. As Figure 2 shows, tweets are (TIMESTAMP '2017-01-01 00:00:00.000'),
selected using two conditions: text contains iphone and ((TTIIMMEESSTTAAMMPP ''22001177--0074--0011 0000::0000::0000..000000'')),)
published within the first three quarters of 2017. The AND STRPOS(LOWER("text"), 'iphone') &gt; 0
choropleth map is redrawn based on the newly filtered GROUP BY 2;
results. Tableau formulates a corresponding SQL query
for each interaction and sends it to the database to com- Figure 4: The rewritten SQL query ′1 with Rule-1 applied
pute the visualization result. The corresponding query on 1. (Blue shows the modifications.)
for the current choropleth map is shown in Figure 3.</p>
      <p>SELECT SUM(1) AS "cnt:tweets",</p>
      <p>CAST("state_name" AS TEXT) AS "state_name"
FROM "tweets"
WHERE CAST( DATE_TRUNC('QUARTER',</p>
      <p>CAST("created_at" AS DATE))
AS DATE) IN (
(TIMESTAMP '2017-01-01 00:00:00.000'),
(TIMESTAMP '2017-04-01 00:00:00.000'),
(TIMESTAMP '2017-07-01 00:00:00.000'))
AND STRPOS(CAST(LOWER(</p>
      <p>CAST(CAST("text" AS TEXT) AS TEXT))
AS TEXT),</p>
      <p>CAST('iphone' AS TEXT) ) &gt; 0
GROUP BY 2;</p>
      <p>As expected, PostgreSQL generates a much more
efifcient plan using the B+ tree as shown in Figure 5(b),
which takes 10 seconds to compute the results.</p>
      <p>Rewriting rule 2: Replacing substring match with
LIKE. Seeing the 10s response time, Emily is still not
satisfied with the performance for the interactive
visualization frontend. After a closer look at the available indexes,
Bob finds that a trigram index on the “text” attribute in
PostgreSQL supports wildcard filtering predicates such
as LIKE and ILIKE. However, PostgreSQL fails to use
this index because the wildcard predicate formulated by
Tableau is STRPOS(LOWER(“text”), ’iphone’) &gt;
0, which is equivalent to “text” ILIKE ’iphone’. To
address this issue, Bob introduces another rewriting rule:</p>
      <p>Rule-2: STRPOS(LOWER( ⟨exp⟩ ), ‘ ⟨literal ⟩ ’ ) &gt; 0
Rewriting rule 1: Removing type casting. Although ⇒ ⟨exp⟩ ILIKE ‘% ⟨literal ⟩ %’.
there is a B+ tree index on the attribute created_at
in the filtering expression DATE_TRUNC(’QUARTER’, With both rules applied to 1, we obtain a new
rewrit“created_at”), PostgreSQL generates a physical plan ten query ′1′ shown in Figure 6. For the new query,
that does a sequential scan instead of using the index, re- PostgreSQL chooses to use both indexes as shown in</p>
      <p>Q1 Plan (32s 046ms)
Finalize GroupAggregate
by state_name
100x faster
3x faster</p>
      <p>Q1' Plan (10s 992ms)
Finalize GroupAggregate
by state_name
Gather Merge</p>
      <p>Gather Merge
Partial GroupAggregate
by state_name</p>
      <p>Sort
by state_name</p>
      <p>Parellel Seq Scan
on tweets
duration: 32s 020ms</p>
      <p>Apply Rule-1</p>
      <p>Apply Rule-2
Partial GroupAggregate
by state_name</p>
      <p>Sort
by state_name</p>
      <p>Parellel Bitmap Heap Scan
on tweets
duration: 10s 773ms</p>
      <p>Bitmap Index Scan
using index on created_at</p>
      <p>Q1'' Plan (280ms)
GroupAggregate
by state_name</p>
      <p>Sort
by state_name</p>
      <p>Bitmap Heap Scan
on tweets</p>
      <p>Bitmap And</p>
      <p>Bitmap Index Scan
using index on text
duration: 62.6ms
Bitmap Index Scan
using index on created_at
duration: 155ms
(a)
(b)
(c)
greSQL always picks one of the two available indexes
for scanning instead of doing an intersection after two
index scans. In addition, for the specific 2, using the B+
tree index on created_at is more eficient than using
the index on text. Thus, Bob introduces the third rule,
which adds a hint to 2 to suggest PostgreSQL to use
the B+ tree on created_at. The new rewritten query
′2′′ is shown in Figure 9.</p>
      <p>The execution time of query ′2′′ is 2.5 seconds, more
than two times faster than the previous rewritten query
′2′. These examples show that by rewriting queries
formulated by the Tableau live mode, VisBooster can
significantly reduce the response time of visualization queries
and improve the user experience.
In case 2, we use a similar example as case 1 to show how
the proposed framework accelerates visualization queries
formulated by Tableau in its live mode connecting to a
MySQL database. We first show that a rule similar to
Rule-1 for PostgreSQL also applies to MySQL, i.e.,
removing type-casting expressions in the filtering clauses can
help the database generate an index-based plan.
However, for MySQL, the syntax of type-casting expressions
formulated by Tableau difers from that for PostgreSQL.</p>
      <p>In the demonstration, we will use a scatterplot query to
show how the database expert Bob introduces a rule to
remove type-casting. Diferent from PostgreSQL, MySQL
does not support trigram indexes. Thus for such queries,
a rule similar to Rule-2 does not help for MySQL. This
observation also shows the value of human involvement
in the proposed framework because human knowledge
about a specific database should be considered in the life
cycle of query rewriting.
[2] slintel, Tableau software market share, 2021. URL:
https://www.slintel.com/tech/data-visualization/
tableau-software-market-share, last accessed
2022-01-25.
[3] Tableau, Tableau online tips: Extracts, live
connections, and cloud data, 2021. URL:
https://www.tableau.com/about/blog/2016/4/
tableau-online-tips-extracts-live-connectionscloud-data-53351, last accessed 2022-01-25.
[4] W. et al., An analytic data engine for visualization
in tableau, in: SIGMOD 2011, Athens, Greece, June
12-16, 2011, ACM, 2011, pp. 1185–1194.
[5] G. Lohman, Is query optimization a “solved”
prob</p>
      <p>lem?, ACM SIGMOD Blog. ACM Blog (2014).
[6] Z. Liu, J. Heer, The efects of interactive latency
on exploratory visual analysis, IEEE Trans. Vis.</p>
      <p>Comput. Graph. 20 (2014) 2122–2131.
[7] C. A. Pahins, S. A. Stephens, et al., Hashedcubes:</p>
      <p>Simple, low memory, real-time visual exploration
of big data, IEEE Trans. Vis. Comput. Graph. 23
(2017) 671–680.
[8] A. C. et al., The case for interactive data exploration
accelerators (ideas), in: HILDA@SIGMOD 2016,
San Francisco, CA, USA, June 26 - July 01, 2016,</p>
      <p>ACM, 2016, p. 11.
[9] S. A. et al, Automated selection of materialized
views and indexes in SQL databases, in: VLDB
2000, September 10-14, 2000, Cairo, Egypt, Morgan</p>
      <p>Kaufmann, 2000, pp. 496–505.
[10] J. Goldstein, P. Larson, Optimizing queries using
materialized views: A practical, scalable solution,
in: SIGMOD 2001, Santa Barbara, CA, USA, May
21-24, 2001, 2001, pp. 331–342.
[11] L. B. et al, Dynamic prefetching of data tiles for
interactive visualization, in: SIGMOD 2016, San
Francisco, CA, USA, June 26 - July 01, 2016, ACM,
2016, pp. 1363–1375.
[12] L. Battle, Behavior-driven optimization techniques
for scalable data exploration, Ph.D. thesis, MIT,</p>
      <p>Cambridge, USA, 2017.
[13] R. M. et al., Bao: Making learned query optimization
practical, in: SIGMOD ’21, China, June 20-25, 2021,</p>
      <p>ACM, 2021, pp. 1275–1288.
[14] Oracle, Postgresql 14: Chapter 41. the rule
system, 2021. URL: https://www.postgresql.org/docs/
current/rules.html, last accessed 2022-01-25.
[15] Oracle, Mysql 8.0: Using the rewriter
query rewrite plugin, 2021. URL: https:
//dev.mysql.com/doc/refman/8.0/en/rewriterquery-rewrite-plugin-usage.html, last accessed
2022-01-25.
[1] C. S. et al., Polaris: A system for query, analy- [16] A. V. et al., Get real: How benchmarks fail to
repsis, and visualization of multidimensional relational resent the real world, in: DBTest@SIGMOD 2018,
databases, IEEE Trans. Vis. and Comput. Graph. 8 Houston, TX, USA, June 15, 2018, ACM, 2018, pp.
(2002) 52–65. 1:1–1:6.</p>
    </sec>
    <sec id="sec-3">
      <title>5. Open Problems</title>
      <p>There are open research challenges related to using
rewriting rules in VisBooster to solve the scalable in-situ
data visualization problem in this middleware
architecture. For example, how to help the database experts
identify the rewriting rules? How and when to apply them
(e.g., diferent query hints should be applied to diferent
queries)? In what order should the rules be applied? We
plan to address these challenges in our future work.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>