1. Introduction

April

Demo of VisBooster: Accelerating Tableau Live Mode Queries Up to 100 Times Faster

Qiushi Bai

Sadeem Alsudais

Chen Li

0 0 University of California , Irvine, CA 92697 , USA

2022

1 2022 0000 0001

We propose a middleware-based query rewriting framework called VisBooster to accelerate visualization queries formulated by Tableau in its live mode. VisBooster intercepts SQL queries by customizing JDBC drivers used by Tableau and uses rules to rewrite the queries to semantically equivalent yet more eficient queries. The rewriting rules are designed by data experts who analyze slow queries and apply their domain knowledge and optimization expertise. We demonstrate that VisBooster can accelerate visualization queries formulated by Tableau up to 100 times faster.

eol>tableau visualization middleware query rewrite

1. Introduction

Recently, various techniques have been proposed to accelerate visualization queries.

Pre-computation-based approaches. Datacube techniques [7, 8] that predefine cube intervals cannot support visualization queries with arbitrary numerical range conditions, while VisBooster does not have such restrictions (5) Result (4) Result on query shapes. View-based techniques such as [9, 10] JDBC that utilize materialized results to accelerate queries re- (1) Query Q Driver (3Q)RueerwyriQtte'n Database quire additional storage overhead. Prefetching-based teercahtenivqiusuesalsiuzcahtioans [q1u1e,r1i2e]s.uVtiilsizBeocoasctheerdisreosruthltosgtoonaaclcetol- (EUmsielyr) (2) Rewrite (Q6u)eSrlioews these techniques, and it can be used together with them Query Rewriter (7) Rewriting to further improve query performance. Rules

Query rewriting-based approaches. Bao [13] is a re- Rewriting Rules cent technique that learns to choose a good hint for a Tableau Live Mode VisBooster DB Expert (Bob) query to accelerate query execution. The proposed VisBooster framework is not limited to any specific type Figure 1: Overview of the VisBooster framework. of rewriting rules. For rewriting rules that involve only query hints, VisBooster can also use Bao as part of the With VisBooster, after step 1, the modified JDBC driver rewriting process. In addition, database vendors provide invokes the Query Rewriter to rewrite to a semantiquery rewriting functionalities (e.g., PostgreSQL [14] and cally equivalent but more eficient SQL query, denoted as MySQL [15]). These functionalities allow users to define ′ (step 2). The new query ′ is sent to the database for their own rewriting rules, and the database will automat- execution (step 3). A challenge of the rule-based query ically use these rules to rewrite queries. However, these rewriter is that for a given query , it needs to decide functionalities only support cases where a materialized what rewriting rules should be applied to generate a new view can be used, while the proposed VisBooster can query ′. In the VisBooster framework, a database exsupport more rewriting cases where a more eficient plan pert analyzes the log of slow queries from the backend is desired. database (step 6). By using the knowledge about the

Compared to these existing techniques, VisBooster has database, the expert identifies rewriting rules and adds the following uniqueness: them to the query rewriter (step 7). With the rewriting rules applied to new visualization queries and more • It adopts a human-in-the-loop approach that fully slow queries identified by the expert, the framework can leverages human intelligence, including domain- iteratively improve the performance of queries. specific knowledge and database-optimization expertise. • It treats both the application (Tableau) and the 4. Demonstration Scenarios backend database as black boxes and requires no code modification. It can significantly improve In this section, we present several scenarios to demonthe user experience in the Tableau live mode with- strate the eficacy of the VisBooster framework for variout changing the existing software and services. ous visualization queries on diferent databases, including • It is a general framework that supports many PostgreSQL and MySQL.

types of rewriting rules such as predicate transformation, query hints, and etc.

4.1. Case 1: Accelerating choropleth map queries on PostgreSQL

3. VisBooster Overview

In case 1, suppose Emily is a data analyst who wants to study the temporal and spatial distributions of social media discussions about iPhone using a table of 30 million tweets. She builds a dashboard in Tableau connected to PostgreSQL in the live mode.

The dashboard in Figure 2 consists of three modules.

On the top is a textual filter that allows users to input a keyword and filters tweets containing the keyword as a substring in their texts. A state-level choropleth map in the middle shows a geo spatial distribution of the filtered tweets grouped by states. A quarter-level bar chart at the bottom shows a temporal distribution of the filtered tweets grouped by quarters. Both the choropleth map and the bar chart change as the user further explores the results. For example, after the user types in iphone sulting in a long execution time of 32 seconds, as shown in Figure 5(a). The long delay of the dashboard queries significantly impairs Emily’s productivity, so she asks a database expert, Bob, for help.

After analyzing the original query sent to PostgreSQL and the physical plan, Bob realizes that those type-casting expressions (e.g., CAST(“created_at” AS DATE)) prevent PostgreSQL from choosing a more eficient indexscan plan. The reason why Tableau adds those typecasting expressions is to prevent computational overflow errors [16]. However, in this case, with the knowledge about the underlying data, Bob can remove those typecasting expressions without worrying about such failures.

Thus, he introduces the following rewriting rule:

Rule-1: CAST( ⟨exp⟩ AS ⟨type⟩ ) ⇒ ⟨exp⟩ .

After applying this rule on 1, we have the new query ′1 in Figure 4. in the input box, the bar chart’s first three quarters in SELECT SUM(1) AS "cnt:tweets", 2017 attract Emily’s attention due to their higher number CAST("state_name" AS TEXT) AS "state_name" of results than other quarters. She selects those quar- FROM "tweets" ters in the bar chart to zoom into the tweets published WHERE DATE_TRUNC('QUARTER', "created_at") IN ( within the time range. As Figure 2 shows, tweets are (TIMESTAMP '2017-01-01 00:00:00.000'), selected using two conditions: text contains iphone and ((TTIIMMEESSTTAAMMPP ''22001177--0074--0011 0000::0000::0000..000000'')),) published within the first three quarters of 2017. The AND STRPOS(LOWER("text"), 'iphone') > 0 choropleth map is redrawn based on the newly filtered GROUP BY 2; results. Tableau formulates a corresponding SQL query for each interaction and sends it to the database to com- Figure 4: The rewritten SQL query ′1 with Rule-1 applied pute the visualization result. The corresponding query on 1. (Blue shows the modifications.) for the current choropleth map is shown in Figure 3.

SELECT SUM(1) AS "cnt:tweets",

CAST("state_name" AS TEXT) AS "state_name" FROM "tweets" WHERE CAST( DATE_TRUNC('QUARTER',

CAST("created_at" AS DATE)) AS DATE) IN ( (TIMESTAMP '2017-01-01 00:00:00.000'), (TIMESTAMP '2017-04-01 00:00:00.000'), (TIMESTAMP '2017-07-01 00:00:00.000')) AND STRPOS(CAST(LOWER(

CAST(CAST("text" AS TEXT) AS TEXT)) AS TEXT),

CAST('iphone' AS TEXT) ) > 0 GROUP BY 2;

As expected, PostgreSQL generates a much more efifcient plan using the B+ tree as shown in Figure 5(b), which takes 10 seconds to compute the results.

Rewriting rule 2: Replacing substring match with LIKE. Seeing the 10s response time, Emily is still not satisfied with the performance for the interactive visualization frontend. After a closer look at the available indexes, Bob finds that a trigram index on the “text” attribute in PostgreSQL supports wildcard filtering predicates such as LIKE and ILIKE. However, PostgreSQL fails to use this index because the wildcard predicate formulated by Tableau is STRPOS(LOWER(“text”), ’iphone’) > 0, which is equivalent to “text” ILIKE ’iphone’. To address this issue, Bob introduces another rewriting rule:

Rule-2: STRPOS(LOWER( ⟨exp⟩ ), ‘ ⟨literal ⟩ ’ ) > 0 Rewriting rule 1: Removing type casting. Although ⇒ ⟨exp⟩ ILIKE ‘% ⟨literal ⟩ %’. there is a B+ tree index on the attribute created_at in the filtering expression DATE_TRUNC(’QUARTER’, With both rules applied to 1, we obtain a new rewrit“created_at”), PostgreSQL generates a physical plan ten query ′1′ shown in Figure 6. For the new query, that does a sequential scan instead of using the index, re- PostgreSQL chooses to use both indexes as shown in

Q1 Plan (32s 046ms) Finalize GroupAggregate by state_name 100x faster 3x faster

Q1' Plan (10s 992ms) Finalize GroupAggregate by state_name Gather Merge

Gather Merge Partial GroupAggregate by state_name

Sort by state_name

Parellel Seq Scan on tweets duration: 32s 020ms

Apply Rule-1

Apply Rule-2 Partial GroupAggregate by state_name

Sort by state_name

Parellel Bitmap Heap Scan on tweets duration: 10s 773ms

Bitmap Index Scan using index on created_at

Q1'' Plan (280ms) GroupAggregate by state_name

Sort by state_name

Bitmap Heap Scan on tweets

Bitmap And

Bitmap Index Scan using index on text duration: 62.6ms Bitmap Index Scan using index on created_at duration: 155ms (a) (b) (c) greSQL always picks one of the two available indexes for scanning instead of doing an intersection after two index scans. In addition, for the specific 2, using the B+ tree index on created_at is more eficient than using the index on text. Thus, Bob introduces the third rule, which adds a hint to 2 to suggest PostgreSQL to use the B+ tree on created_at. The new rewritten query ′2′′ is shown in Figure 9.

The execution time of query ′2′′ is 2.5 seconds, more than two times faster than the previous rewritten query ′2′. These examples show that by rewriting queries formulated by the Tableau live mode, VisBooster can significantly reduce the response time of visualization queries and improve the user experience. In case 2, we use a similar example as case 1 to show how the proposed framework accelerates visualization queries formulated by Tableau in its live mode connecting to a MySQL database. We first show that a rule similar to Rule-1 for PostgreSQL also applies to MySQL, i.e., removing type-casting expressions in the filtering clauses can help the database generate an index-based plan. However, for MySQL, the syntax of type-casting expressions formulated by Tableau difers from that for PostgreSQL.

In the demonstration, we will use a scatterplot query to show how the database expert Bob introduces a rule to remove type-casting. Diferent from PostgreSQL, MySQL does not support trigram indexes. Thus for such queries, a rule similar to Rule-2 does not help for MySQL. This observation also shows the value of human involvement in the proposed framework because human knowledge about a specific database should be considered in the life cycle of query rewriting. [2] slintel, Tableau software market share, 2021. URL: https://www.slintel.com/tech/data-visualization/ tableau-software-market-share, last accessed 2022-01-25. [3] Tableau, Tableau online tips: Extracts, live connections, and cloud data, 2021. URL: https://www.tableau.com/about/blog/2016/4/ tableau-online-tips-extracts-live-connectionscloud-data-53351, last accessed 2022-01-25. [4] W. et al., An analytic data engine for visualization in tableau, in: SIGMOD 2011, Athens, Greece, June 12-16, 2011, ACM, 2011, pp. 1185–1194. [5] G. Lohman, Is query optimization a “solved” prob

lem?, ACM SIGMOD Blog. ACM Blog (2014). [6] Z. Liu, J. Heer, The efects of interactive latency on exploratory visual analysis, IEEE Trans. Vis.

Comput. Graph. 20 (2014) 2122–2131. [7] C. A. Pahins, S. A. Stephens, et al., Hashedcubes:

Simple, low memory, real-time visual exploration of big data, IEEE Trans. Vis. Comput. Graph. 23 (2017) 671–680. [8] A. C. et al., The case for interactive data exploration accelerators (ideas), in: HILDA@SIGMOD 2016, San Francisco, CA, USA, June 26 - July 01, 2016,

ACM, 2016, p. 11. [9] S. A. et al, Automated selection of materialized views and indexes in SQL databases, in: VLDB 2000, September 10-14, 2000, Cairo, Egypt, Morgan

Kaufmann, 2000, pp. 496–505. [10] J. Goldstein, P. Larson, Optimizing queries using materialized views: A practical, scalable solution, in: SIGMOD 2001, Santa Barbara, CA, USA, May 21-24, 2001, 2001, pp. 331–342. [11] L. B. et al, Dynamic prefetching of data tiles for interactive visualization, in: SIGMOD 2016, San Francisco, CA, USA, June 26 - July 01, 2016, ACM, 2016, pp. 1363–1375. [12] L. Battle, Behavior-driven optimization techniques for scalable data exploration, Ph.D. thesis, MIT,

Cambridge, USA, 2017. [13] R. M. et al., Bao: Making learned query optimization practical, in: SIGMOD ’21, China, June 20-25, 2021,

ACM, 2021, pp. 1275–1288. [14] Oracle, Postgresql 14: Chapter 41. the rule system, 2021. URL: https://www.postgresql.org/docs/ current/rules.html, last accessed 2022-01-25. [15] Oracle, Mysql 8.0: Using the rewriter query rewrite plugin, 2021. URL: https: //dev.mysql.com/doc/refman/8.0/en/rewriterquery-rewrite-plugin-usage.html, last accessed 2022-01-25. [1] C. S. et al., Polaris: A system for query, analy- [16] A. V. et al., Get real: How benchmarks fail to repsis, and visualization of multidimensional relational resent the real world, in: DBTest@SIGMOD 2018, databases, IEEE Trans. Vis. and Comput. Graph. 8 Houston, TX, USA, June 15, 2018, ACM, 2018, pp. (2002) 52–65. 1:1–1:6.

5. Open Problems

There are open research challenges related to using rewriting rules in VisBooster to solve the scalable in-situ data visualization problem in this middleware architecture. For example, how to help the database experts identify the rewriting rules? How and when to apply them (e.g., diferent query hints should be applied to diferent queries)? In what order should the rules be applied? We plan to address these challenges in our future work.