<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Herdecke, Germany.
$ Jerome.Thiessat@tu-dresden.de (J. Thiessat);
Dirk.Habich@tu-dresden.de (D. Habich);
Wolfgang.Lehner@tu-dresden.de (W. Lehner)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Steering the PostgreSQL query optimizer using hinting: State-Of-The-Art and open challenges</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jerome Thiessat</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dirk Habich</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wolfgang Lehner</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dresden Database Research Group, TU Dresden</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>In the field of query optimization, a tremendous amount of research has been delivered into fixing and adjusting existing optimizer solutions. Moreover, recent work has also shown that substantial performance gain can be achieved by setting query optimizer instructions appropriately. However, the sets of beneficial instructions may vary greatly for each query. In this contribution, we provide a state-of-the-art review on optimizer hinting and present some experimental evaluations showing hints should not be neglected during evaluation based on preliminary assumptions.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Optimizer Configuration</kwd>
        <kwd>Query Optimization</kwd>
        <kwd>Search Space Traversal</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        are (i) the review of the state-of-the-art research in the
ifeld of optimizer hinting and (ii) providing experimental
In the vast field of database management system (DBMS) evaluations of current challenges that need to be tackled
design, query optimization has proven to be one of the to further advance in this research field. We do so at the
backbone components of query performance. Modern example of the highly popular and open source DBMS
query optimizers consist of the following three compo- PostgreSQL (PSQL).
nents [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ]: (i) plan enumerator, (ii) cost model, and Outline: In Section 2, we explain query hinting at the
(iii) cardinality estimator. These components are hierar- example of PSQL. We will go into intricate details about
chically dependent. This means that the plan enumerator the query optimizer hints present in PSQL and explain
relies on the calculations of the cost model, which in turn which hints can be observed, how they behave, and how
relies on the cardinality estimators’ results. Since tra- the systems’ hinting has evolved over the latest versions.
ditional optimizers have shown to produce error prone Section 3 depicts the state-of-the-art systems for utilizing
estimates [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ], current research establishes various ap- these hints to harness hidden optimizer potential. We
proaches of refining plan enumerators [
        <xref ref-type="bibr" rid="ref6">6, 7</xref>
        ], cost models provide insights into leveraging hinting from the
pre[8, 9], and also cardinality estimation [10, 11, 12]. These sented state-of-the-art in Section 4. We will then be able
approaches all aim at adding, correcting, or substituting to obtain valuable information regarding the full
potenexisting optimizer components to obtain faster query tial of hinting in query optimization based on the popular
executions. Join-Order-Benchmark (JOB) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Lastly, we open up the
      </p>
      <p>
        One of the long established [13], yet only recently pop- discussion in Sections 5 and 6, where we discuss possible
ular approaches [
        <xref ref-type="bibr" rid="ref5">14, 15, 5</xref>
        ], has been the targeted steer- highly valuable research directions, hinting in general,
ing of optimizer behavior through configuration options and our reflections.
called hints. These hints can generally be categorized in
the class of enumerator approaches. However, instead of
substituting or correcting an existing enumerator, hints 2. Query Hinting in PostgreSQL
restrict the search space of enumerators, instructing the
planner to ignore certain plans. Nevertheless, hints can
not only be used for restricting plan enumeration.
Recent research has shown that hinting can even be used
to indirectly learn optimizer cost models [14, 15].
      </p>
      <p>Contribution: Our core contributions in this paper</p>
      <p>Modifying the PSQL optimizer can be done mainly by
setting Boolean configuration parameters called query
planner method configurations 1. We restrict our
contribution to these Boolean hints as they allow us to steer
the optimizer plan enumeration phase rather than
interfering with the underlying complex cost model. In their
nature, these configurations are mostly instructions to
force the optimizer to restrict itself to certain operations
while traversing a query’s plan space. However, since
not all of these configurations can be forcefully switched</p>
      <p>Scans Joins Partition Parallel Aggregation Misc from multiple sources. However, instead of processing
every child node subsequently in parallel fashion,
proBitmap Hash Pruning GMaetrhgeer AgHgraesghate Materialization cesses are spread across multiple child nodes to process
Index Merge PartitJiooninwise- Append APgregsreogrtaetde Sort tchheilmd nsiomdeusltraantehoeurstlhyatnopaallroawllefliuslmly wpaitrhailnleolpexereactuotrisonofoaf
Index-Only NLeosotepd PaArgtigtiroengwaitsee- Hash Geqo child node. For parallel hash, instead of building a hash
Sequential AApspyenncd IncreSmoretntal taambolengfocroemamchonprporcoecsess,saesc.oTmhemloasnt hpaarsahlltealbhlienitscsohnasrisetds
TID Memoize of async append. This hint allows parallel append plans
V12 V13 V14/15 V16 DiUscsoeuOranglyed
aogngfroergeaitginondahtianttsabcolenss,issutpopfohratsinhganshdaprrdeesdorstoeudrcaegsg.rTeghaetion.</p>
      <p>
        Figure 1: PSQL hint development starting from version 12. Hash aggregation allows the use of hash functions
for splitting, aggregating, and merging a target column.
of 2, the term hint has established [
        <xref ref-type="bibr" rid="ref5">14, 5</xref>
        ]. Additionally, Presorted aggregation allows the query planner to
conhint set refers to multiple hints, and hinting to the act struct plans that produce rows that are already sorted for
of instructing the PSQL query optimizer with a hint set. further aggregation.
      </p>
      <p>Notably, in PSQL, hints are global, which implies that Within the miscellaneous section are five hints.
Mathey act upon the optimization of a whole query. This terialization allows to enable caching of intermediate
means, that e.g., disabling the hint "Nested Loop Join" results in memory. Sort naturally allows the usage of
will disable the usage of this operator (if possible) for the sort operations within a plan. Geqo decides whether
entire query plan optimization phase. to use the genetic optimizer of PSQL. This optimizer is</p>
      <p>Even the oldest still supported version (i.e., version 12) by default switched on if the number of joins in a query
of PSQL has a plethora of these hinting options which exceeds twelve. Incremental sort enables the usage of
may be highly valuable to investigate. A rigorous set of an optimized multi-key sorting method that allows to
Boolean hints beginning from PSQL version 12 up to the take advantage of already sorted columns. Memoize
enlatest version of this writing (i.e., version 16) can be found courages the use of nested loop join, as memoize caches
in Figure 1. We observe that the base amount of hints is parameterized scans for nested loop joins only.
already plentiful. While the core hints3 of PSQL consist Additionally, there are three options for usage in a
parof six traditional hints, namely index, index-only, and titioned table environment. However, as these
partitionsequential scan, as well as hash, merge, and nested loop ing hints require the building of partitioned tables, they
join, there are multiple other options to be considered. remain mostly unused. While the ability to prune
parti</p>
      <p>Bitmap scans for example scan multiple tuple point- tioned tables remains on as for most hints, the
partitioners, which are then sorted by their physical location to wise hints are of by default as partitioning requires
adallow ordered access over all locations. Especially com- ditional attention and is not part of the default behavior
bining multiple bitmaps is easy as multiple bitmap scans of PSQL.
can be combined through simple logical operator combi- Moreover, there are also some hints that cannot be
nation. TID Scans provide fast access with the knowl- completely switched of to ensure that a plan can be
edge of tuple ids (i.e., physical location) of a row. Notably, created. Switching these hints of highly discourages
tuple ids provide no identifying behavior like primary their use in the planning phase. These hints are marked
keys as their value may change during the lifespan of in red in Figure 1. Moreover, the use of parallel hashing
their respective database table. naturally requires hash joins to be enabled.</p>
      <p>Additional non-core hints include hints for parallel ex- Now, we are able to further understand the possible
ecution, aggregation, and general miscellaneous settings. implications of the presented PSQL hints. In the
followWithin parallel execution, gather merge indicates the ing, we dive deeper into the state-of-the-art that utilizes
use of the (gather) merge node within a plan that allows the presented hints to gain advantages during the
optichild nodes of a query plan, or in fact, the whole plan mization queries.
to be executed in parallel. While the gather node solely
merges results, the gather merge node indicates that
subnodes output sorted tuples which are then merged in a 3. Query Optimization Using
sort-preserving manner. The parallel append works Optimizer Hints
similarly to the regular append when combining rows</p>
      <sec id="sec-1-1">
        <title>2as this may hinder the optimizer to construct a query plan at all 3https://www.postgresql.org/docs/16/planner-optimizer.html, accessed: March 4th, 2024</title>
      </sec>
      <sec id="sec-1-2">
        <title>While modifying the core components of an optimizer</title>
        <p>
          through diferent algorithms and machine learning
models is currently investigated with great interest [
          <xref ref-type="bibr" rid="ref6">8, 9, 11,
10, 12, 6, 7</xref>
          ], the steering of optimization using optimizer Optimal Hint Set BAO
hints remains to be a topic of deeper research. However, FG PG Default
there are still some major contributors in this field.
        </p>
        <p>
          BAO. The first one is the bandit optimizer (BAO) [ 14]. 2.4
BAO is a machine learning model that uses hints to indi- 2.2
rectly learn the optimizer’s cost model. BAO builds upon tro2.0
a commonly used machine learning technique derived ca
from the successful use in learned query optimization, pF1.8
namely reinforcement learning. In its core, BAO uses a ued1.6
ifxed number of hint sets, which are used to obtain dif- peS1.4
ferent PSQL EXPLAIN plans during optimization. There,
each of the resulting plans is used as an input for a Tree 1.2
Convolutional Neural Network (TCNN). The TCNN of 1.0
BAO is built similarly to a regular convolutional neural
network with an additional initial tree flattening layer 10 20 30 40Train5i0ng S6p0lit [%7]0 80 90 100
to transform the input tree. The output of the TCNN is
an estimated query cost value that is used to determine Figure 2: Evaluation of FASTgres and BAO on the real-world
which of the input explain plans to use. This allows BAO workload Stack-Overflow [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
to implicitly use the hints through PSQLs explain
functionality. Lastly, Thompson sampling is used to enable edge is usually obtained only through rigorous analysis
the reinforcement learning character of BAO, which bal- of query workloads or even individual queries, depending
ances exploration and exploitation by guiding the data, on their versatility. As it is infeasible to have such
inforupon which BAO is trained. mation a priori, and knowledge from previous queries
        </p>
        <p>
          However, there are some considerations to account do not necessarily reflect future ones, we argue that
defor when using BAO. First, BAO uses only the six main termining query spans eficiently for unseen queries is a
hints of PSQL, which is merely a subset of the already challenging and under-investigated task.
presented hints. Even inside these six hints (i.e., 26 = 64 FASTgres. Lastly, there is another approach, named
hint set combinations), only a subset of five hint sets FASTgres [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] that breaks with the commonly used
reinare feasible enough such that optimization time does forcement learning approaches for learned optimizers
not start to exceed the actual runtime. Moreover, BAO and optimizer steering. FASTgres settles directly at the
uses an indirect approach that necessitates evaluating plan enumerator by restricting its search space through
multiple query plans before being able to decide on which hinting predictions. FASTgres uses a supervised
clasto choose. Naturally, doing so does not scale well with sification approach to predict the most beneficial hint
increasing numbers of hint sets used. Lastly, BAOs input set directly from incoming queries. Here, queries are
plans rely on PSQL EXPLAIN plans, which have been predetermined into diferent query classes called
conshown to produce notoriously error prone estimates [4, texts. In FASTgres, contexts can be defined in multiple
5]. granularities but are commonly distinguished by their
        </p>
        <p>Autosteer. As a successor to BAO, the approach of table join groups. Within each context, a supervised
graAutosteer [15] was developed. Just like BAO, Autosteer dient boosting model is used to predict a query’s best
is a learned cost model by using query hinting. In this hint set. This approach naturally difers from previous
approach, the existing challenge of selecting hint sets, work by using a divide-and-conquer approach to provide
which were solved by manually applying expert knowl- locally well performing models that are robust against
edge in BAO, were tackled. Their hint set traversal al- data and workload drifts. This robustness is achieved
gorithm consists of iteratively combining eficient query not only by using local models but also by providing an
spans [16]. For each combination, they determine the experience-based query anomaly detection that retrains
query’s runtime and decide whether to keep the combi- context models once receiving abnormally performing
nation or not, based on the default query execution time. queries. Additionally, since a supervised approach is
apWhile this algorithm leverages on hint set traversal from plied, labeled data is required. Such labeled query data is
BAO’s fixed hint sets, there are still some challenges. obtained by using a grid search strategy with aggressive</p>
        <p>First, a rather large assumption is made by postulating timeouts determined by the currently best query-hint-set
that not all hint sets are beneficial. While this may hold combination.
true for certain workloads, in DBMSs and with specific Generally, FASTgres has shown to provide superior
versions, the benefits of hints cannot be deemed static. results with the usage of supervised learning for hint set
Additionally, a key assumption is the prior knowledge of prediction as observable in Figure 2.
query spans (i.e., eficient hints) of a query. Such knowl- While the performance of supervised classification for
80
t]60
n
u
o
C
[
e
c
n
rre40
u
c
c
O
20
0</p>
        <p>Positive Impact of Hint Deactivation: JOB (113 Queries)
hint set prediction shows promising results, there are
also challenges that need to be overcome. Just like BAO,
FASTgres only utilizes six hints. While they traverse the
whole search space rather than only a few combinations
of hints, there is still the huge challenge of scalability.</p>
        <p>Even with their provided aggressive timeout-strategy,
search spaces that grow exponentially have to be
evaluated fully. If we observe the Boolean hint sets from PSQL
in Table 1, the hint set search space would constitute
222 ≈ 4.2 · 106 combinations, which is infeasible to be
labeled in FASTgres’ strategy. Having such scalability
restrictions naturally leaves the question open about how
much potential is neglected when labeling queries.</p>
        <p>To depict open challenges emerging from the
stateof-the-art work on hinted optimizer steering, we now
investigate the potential of hinting at the example of
PSQL.</p>
        <sec id="sec-1-2-1">
          <title>INDEX_ONLYN_SESESCIQNTA_DENSEDCX_AL_NOSCOMAPEN_RJGOEHIN_AJSOHIN_TJIDO_INSCPAANPSRAOMARRA_ATTH_EAARSPIHAPLEGINZHADAATTSHIOHE_NRBAI_TGMMGEARPG_ESINCCAA_NSSMYONERMCPT_ROAEIZPSEPOERNTD_AGGGEQO</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>4. Optimization Potential of Query</title>
    </sec>
    <sec id="sec-3">
      <title>Hinting</title>
      <sec id="sec-3-1">
        <title>Since analyzing these hints can become increasingly</title>
        <p>time consuming, rather than taking a large workload like
Stack-Overflow [ 14] with 6191 queries, we focus on JOB
Optimizer hinting is a volatile task in its nature. There that contains real world data from IMDB and a query
are many diferent influencing aspects that need to be workload of 113 analytical queries. Since JOB does not
considered such that they become intangible rapidly. Ex- provide partitioned tables by default, the partition hints
emplary, when considering the decision of whether to for pruning, join, and aggregate will be neglected in
use a hash join in a query or not, the influence factors the further analysis.
can depend on the collected statistics, available memory In our first experiment, we conduct an evaluation
for building hash tables, possible parallel execution, the whether the option of turning of the usage of a hint
systems hardware4, the systems build version5, and pos- pertains a positive impact on the overall runtime of a
sibly many more. As subsets of these influence factors query or not. For each JOB query, the influence of each
are subject to constant change, the proper determination hint was measured. The results can be observed in
Figof operator choice for a single query becomes a tedious ure 3 for PSQL version 16.2.
task that is infeasible to fulfill properly during runtime. Along the x-axis, every hint is displayed. The y-axis
With these factors in mind, it is only natural to assume shows how many times in the whole workload a positive
that, by default, every hint set can have an influence on performance impact was noted by allowing a hint to be
a query. With this general assumption, we now investi- switched of from their default setting. Notably, every
gate the PSQL query optimizer in further detail. For the hint has at least 20 cases out of 113 in which turning
following experiments, we evaluated - unless otherwise them of is beneficial. In detail, allowing for nested loop
stated - on a PSQL Docker Image that has been prop- joins to be switched of amounts for 26 positive impacts
erly configured for its hardware (i.e., using PG-Tune 6) (i.e., 23%), while allowing bitmap scans to be switched
and using an ANALYZE step to build initial database of even amounts for 88 positive impacts (i.e., 78%). This
statistics. Additionally, each run has been evaluated once implies that every hint on their own may have useful
with each query-hint-set combination being run once cases in which their impact cannot be neglected.
before measuring, ensuring properly equal pre-warming. While these insights show that considering every hint
The used hardware consists of an Intel Xeon Gold 6216 is a sensible choice, the exact impact of these hints
reCPU (Skylake architecture) with 12 cores, 92 GiB of main mains ambiguous. Results of the average speedup gained
memory, and 1.8 TiB of HDD storage. from these positive impacts can be taken from Figure 4.</p>
        <p>As we observed in Figure 1, there are 22 hints in the Here, the y-axis displays the average speedup factor
latest version of PSQL (i.e., v16) at the time of writing. that is obtained throughout the whole workload within
the positively impacting hints. Notably, the option to
turn of sequential scanning now provides average
speedups of factor six, while bitmap scans with the
highest previous occurrence only provide speedup factors of</p>
      </sec>
      <sec id="sec-3-2">
        <title>4i.e., Memory, SSD, HDD availability</title>
        <p>5i.e., their eficiency in implementing the operator and underlying
cost factors
6https://pgtune.leopard.in.ua/, accessed April 25th, 2024
Average Speedup per Positive Occurrence: JOB (113 Queries)</p>
        <p>Average Gain per Positive Occurrence: JOB (113 Queries)</p>
        <sec id="sec-3-2-1">
          <title>INDEX_ONLNYS_EESSIQCNTA_EDSNDEC_XAL_ONSMOCEAPRN_JGOEHIN_AJSOHIN_TJIDO_INSCPAAPNSRAMOARAR_ATTH_EAARSPIAHPLEGINZHAADATTSHIHOEB_RNAI_TGMMGEARPG_ISENCCAA_SNSMYONERPCMTR_OAEIPZSEPOERNTD_AGGGEQO</title>
          <p>quential scan, and merge join, which are part of the six
core hints. However, we also find fundamental potential
in the parallel hashing, Geqo, and memoize options.</p>
          <p>Overall, we observe that within the JOB workload, all
hints may vary in occurrence, their speedup, and their
absolute time gain without indicating rules that can be
followed. Our empirical studies on other PSQL versions
show that even between versions, there are no rules to
be delineated that could lead to inferences that can be
made for future queries.</p>
          <p>When focusing on single queries in the workload, we
obtain a more detailed view about the query variety. We
show an excerpt in Figure 6 due to space restrictions,
while noting that the varying behavior can be observed
throughout the whole workload.</p>
          <p>We notice a variety of diferent behaviors. Query 1
from Figure 6a contains five joins and four filters of which
two are wildcard filters. The three major hints are
sequential scan, nested loop join, and hash join. Having
potential speedup factors of over twelve indicates that
the default PSQL optimizer had issues correctly
determining scan and join operators, favoring sequential scans
over index and index-only scans, which indicates too low
selectivity estimation for scan operations. Additionally,
the potential gain through hash joins indicates that the
default optimizer might have estimated too high
selectivity join results, leading to the choice of hash joins over
nested loop or merge joins. On the other hand, disabling
nested loop joins results in catastrophic deterioration
of execution time, reinforcing the thought that nested
loop joins are the operators that should be predominantly
used in this query. Query 8 contains five joins and two
iflter predicates. Figure 6b displays a scenario, where
the default optimization is already surprisingly eficient.
However, even then, small improvements are still
possible on multiple hints with disabling merge joins and
sorting as the predominant ones. Query 32 contains six
joins with one filter predicate. Figure 6c shows a result
that is quite common in our analysis. There are a lot of
hints which carry speedup potential, while some hints
may lead to catastrophic execution times with respect to
the default evaluation.</p>
          <p>With these experiments, we could show at the example
of PSQL and JOB, that there is no panacea for finding the
best hints of a query or even workload a priori of
evaluation. This leads us to the conclusion that query hinting is
a complex topic that necessitates further investigation to
properly infer on the usefulness of each query-hint-set
combination.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>INDEX_ONLYN_SESESCIQNTA_DENSEDCX_AL_NOSCOMAPEN_RJGOEHIN_AJSOHIN_TJIDO_INSCPAANPSRAOMARRA_ATTH_EAARSPIHAPLEGINZHADAATTSHIOHE_NRBAI_TGMMGEARPG_ESINCCAA_NSSMYONERMCPT_ROAEIZPSEPOERNTD_AGGGEQO</title>
          <p>just above one. These results imply that the occurrence
of positive hints does not need to correlate with proper
speedups across the workload.</p>
          <p>Moreover, while speedup by factor is a valuable
indicator of how much factor-wise gain we can obtain by
switching individual hints on or of, another valuable
insight can be achieved by looking at the absolute time
impact of these hints. Results portraying the absolute
gain can be observed in Figure 5.</p>
          <p>Along the y-axis, we display the time savings that
can be achieved from allowing a hint to be switched of 5. Future Research
at logarithmic scale. Again, contrary to the indications
from Figures 3 and 4, we notice that fundamental time The potential for future research regarding optimizer
savings lie within the nested loop join, index scan, se- hinting is manifold. As we have shown, every hint has to
12
10
INDEX_ONLNYS_EESSIQCNTA_EDSNDEC_XAL_ONSMOCEAPRN_JGOEHIN_AJSOHIN_TJIDO_INSCPAAPNSRAMOARAR_ATTH_EAARSPIAHPLEGINZHAADATTSHIHOEB_RNAI_TGMMGEARPG_ISENCCAA_SNSMYONERPCMTR_OAEIPZSEPOERNTD_AGGGEQO</p>
          <p>INDEX_ONLNYS_EESSIQCNTA_EDSNDEC_XAL_ONSMOCEAPRN_JGOEHIN_AJSOHIN_TJIDO_INSCPAAPNSRAMOARAR_ATTH_EAARSPIAHPLEGINZHAADATTSHIHOEB_RNAI_TGMMGEARPG_ISENCCAA_SNSMYONERPCMTR_OAEIPZSEPOERNTD_AGGGEQO</p>
          <p>INDEX_ONLNYS_EESSIQCNTA_EDSNDEC_XAL_ONSMOCEAPRN_JGOEHIN_AJSOHIN_TJIDO_INSCPAAPNSRAMOARAR_ATTH_EAARSPIAHPLEGINZHAADATTSHIHOEB_RNAI_TGMMGEARPG_ISENCCAA_SNSMYONERPCMTR_OAEIPZSEPOERNTD_AGGGEQO
(a) Single hint influence on query 1a.</p>
          <p>(b) Single hint influence on query 8c.</p>
          <p>(c) Single hint influence on query 32a.
be considered when trying to optimize queries. However, as a whole to be able to extract all possible gains.
Morewithin each query, we have seen that some hints have over, we showed that promising hints cannot be simply
less influence than others. For example, an algorithmic predetermined without extensive analysis and that the
solution with eficient hint pruning methods and possi- impact of hint sets varies between diferent queries and
bly early stopping mechanisms that can be run without also across diferent PSQL versions.
prior knowledge of eficient hints may prove promising. Even though [15] provides an algorithmic solution for
By radically reducing the search space, this might allow their hint set traversal problem, there are aspects that
easier scalability to traverse query-hint-set combinations. still do not sufice for a scalable traversal solution. As
Such an algorithm could evaluate each hint set on their shown in our evaluations, the usage of a predetermined
own either from a subtractive (i.e., step-wise disabling) set of hints for initial evaluation, namely query spans, is
or additive (i.e., step-wise enabling) starting point. Ei- not derivable beforehand and, thus, impractical.
ther solution would provide insight into the one-ring We could show that the research field for hint set
neighborhood (i.e., neighboring hint sets that only difer traversal ofers plenty of optimization potential for future
by one hint being switched of or on) of changing hints work.
from a default situation, in which either all hints are
switched on or of. Additionally, further steps could be References
decided based on these initial results. Such further steps
can include the use of stopping criteria based on
previous results, the current one-ring neighborhood results,
previously observed queries, and possibly many more.</p>
          <p>Lastly, even though current learned models provide
reasonable speedup gains, the development of a tailored
featurization for query hinting seems lucrative, as
currently, only featurization options that are tailored for
cardinality estimation are in use. Potentially, such a
dedicated featurization method may require smaller input
vectors and generalize better, depending on the chosen
representation.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>6. Conclusion</title>
      <sec id="sec-4-1">
        <title>Lastly, we summarize our findings in this contribution.</title>
        <p>We showed that there is already valuable research in the
ifeld of query optimizer hinting that focuses on a small
set of hints that they deem feasible. Additionally, we
could show at the example of JOB and PSQL, that
hinting is a manifold challenge that cannot be heuristically
reduced to a mere subset of hints but must be considered
ment learning for join order enumeration, in:
aiDM@SIGMOD 2018, 2018, pp. 3:1–3:4.
[7] S. Krishnan, Z. Yang, K. Goldberg, J. M. Hellerstein,
I. Stoica, Learning to optimize join queries with
deep reinforcement learning, CoRR abs/1808.03196
(2018).
[8] R. Marcus, P. Negi, H. Mao, C. Zhang, M. Alizadeh,
T. Kraska, O. Papaemmanouil, N. Tatbul, Neo: A
learned query optimizer, Proc. VLDB Endow. 12
(2019) 1705–1718.
[9] Z. Yang, W. Chiang, S. Luan, G. Mittal, M. Luo,
I. Stoica, Balsa: Learning a query optimizer without
expert demonstrations, in: SIGMOD, 2022, pp. 931–
944.
[10] M. Stillger, G. M. Lohman, V. Markl, M. Kandil, LEO
- db2’s learning optimizer, in: VLDB, 2001, pp. 19–
28.
[11] A. Kipf, T. Kipf, B. Radke, V. Leis, P. A. Boncz,
A. Kemper, Learned cardinalities: Estimating
correlated joins with deep learning, in: CIDR, 2019.
[12] L. Woltmann, C. Hartmann, M. Thiele, D. Habich,
W. Lehner, Cardinality estimation with local deep
learning models, in: aiDM@SIGMOD, 2019, pp.
5:1–5:8.
[13] D. K. Burleson, Oracle high-performance SQL
tuning, McGraw-Hill, Inc., 2001.
[14] R. Marcus, P. Negi, H. Mao, N. Tatbul, M. Alizadeh,
T. Kraska, Bao: Making learned query optimization
practical, SIGMOD Rec. 51 (2022) 6–13.
[15] C. Anneser, N. Tatbul, D. E. Cohen, Z. Xu, P.
Pandian, N. Laptev, R. Marcus, Autosteer: Learned
query optimization for any SQL database, Proc.</p>
        <p>VLDB Endow. 16 (2023) 3515–3527.
[16] P. Negi, M. Interlandi, R. Marcus, M. Alizadeh,
T. Kraska, M. T. Friedman, A. Jindal, Steering query
optimizers: A practical take on big data workloads,
in: SIGMOD, 2021, pp. 2557–2569.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P. G.</given-names>
            <surname>Selinger</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Astrahan</surname>
            ,
            <given-names>D. D.</given-names>
          </string-name>
          <string-name>
            <surname>Chamberlin</surname>
            ,
            <given-names>R. A.</given-names>
          </string-name>
          <string-name>
            <surname>Lorie</surname>
          </string-name>
          , T. G. Price,
          <article-title>Access path selection in a relational database management system</article-title>
          ,
          <source>in: SIGMOD, ACM</source>
          ,
          <year>1979</year>
          , pp.
          <fpage>23</fpage>
          -
          <lpage>34</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Perron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kraska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stonebraker</surname>
          </string-name>
          ,
          <article-title>How I learned to stop worrying and love reoptimization</article-title>
          ,
          <source>in: ICDE</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1758</fpage>
          -
          <lpage>1761</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>V.</given-names>
            <surname>Leis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gubichev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mirchev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Boncz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kemper</surname>
          </string-name>
          , T. Neumann,
          <article-title>How good are query optimizers, really?</article-title>
          ,
          <source>Proc. VLDB Endow</source>
          .
          <volume>9</volume>
          (
          <year>2015</year>
          )
          <fpage>204</fpage>
          -
          <lpage>215</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>V.</given-names>
            <surname>Leis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Radke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gubichev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mirchev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Boncz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kemper</surname>
          </string-name>
          , T. Neumann,
          <article-title>Query optimization through the looking glass, and what we found running the join order benchmark</article-title>
          ,
          <source>VLDB J</source>
          .
          <volume>27</volume>
          (
          <year>2018</year>
          )
          <fpage>643</fpage>
          -
          <lpage>668</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L.</given-names>
            <surname>Woltmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Thiessat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hartmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Habich</surname>
          </string-name>
          , W. Lehner, Fastgres:
          <article-title>Making learned query optimizer hinting efective</article-title>
          ,
          <source>Proc. VLDB Endow</source>
          .
          <volume>16</volume>
          (
          <year>2023</year>
          )
          <fpage>3310</fpage>
          -
          <lpage>3322</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Marcus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Papaemmanouil</surname>
          </string-name>
          , Deep reinforce-
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>