<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Scalable Black-Box Optimization System for Auto-Tuning VLSI Synthesis Programs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Matthew M. Ziegler</string-name>
          <email>zieglerm@us.ibm.com</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hung-Yi Liu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>George Gristede</string-name>
          <email>gristede@us.ibm.com</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bruce Owens</string-name>
          <email>browens@us.ibm.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ricardo Nigaglioni</string-name>
          <email>nricardo@us.ibm.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca P. Carloni</string-name>
          <email>luca@cs.columbia.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Columbia University</institution>
          ,
          <addr-line>NY</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>IBM Systems &amp; Technology Group</institution>
          ,
          <addr-line>Austin, TX</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>IBM Systems &amp; Technology Group</institution>
          ,
          <addr-line>Rochester, MN</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>IBM T. J. Watson Research Center</institution>
          ,
          <addr-line>Yorktown Heights, NY</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <abstract>
        <p>Modern logic and physical synthesis tools provide numerous options and parameters that can drastically impact design quality; however the large number of options leads to a complex design space difficult for human circuit designers to navigate. We tackle this parameter tuning problem with a novel system employing intelligent search strategies and parallel computing, thus automating one of the key design tasks conventionally performed by a human designer. We provide an overview of this system, called SynTunSys, as well as results from employing it during the design of the IBM z13 22nm highperformance server chip, currently in production. During this major processor design, SynTunSys provided significant savings in human design effort and achieved a quality of results beyond what human designers alone could achieve, yielding on average a 36% improvement in total negative slack and a 7% power reduction.1 * Hung-Yi Liu is now with the Intel Design Technology &amp; Solutions Group, Hillsboro, OR, USA.</p>
      </abstract>
      <kwd-group>
        <kwd>synthesis</kwd>
        <kwd>design space exploration</kwd>
        <kwd>parameter tuning</kwd>
        <kwd>optimization</kwd>
        <kwd>VLSI design</kwd>
        <kwd>design methodology</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>The design of modern high-performance processors is a quest to optimally tune and
balance multiple objectives, such as performance, power, and reliability. This
multiobjective design space is further complicated by the need for more complex VLSI
(very-large-scale integration) chips to fuel the ever increasing desire for more
compute power. To cope with this design complexity, the VLSI design community has
leveraged CAD (computer-aided design) tools for many decades now; however, the
high flexibility and sophistication of advanced synthesis tools increases their
complexity and makes navigating the design space difficult and sometimes non-intuitive
for their users.</p>
      <p>
        The industrial synthesis tool-flow we employ has over 1000 parameters [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. These
parameters span the logic and physical synthesis space and the control settings for
modifying the synthesis steps, such as: logic decomposition, technology mapping,
placement, estimated wire optimization, power recovery, area recovery, and/or higher
effort timing improvement. The parameters also vary in data type (Boolean, integer,
floating point, and string). Considering that an exhaustive search of only 20
Booleantype parameters leads to over one million combinations, and synthesis runs may take
several hours or even days, it is clear that intelligent search strategies are required.
      </p>
      <p>As an example of the wide design space available from modifying synthesis
parameters, Fig. 1 shows the scatter plot of achievable design points for a portion of a
synthesized floating-point multiplier macro. A macro may span from 1K to 1M gates
in our context. Each point denotes the timing and power values achieved simply by
tuning the input parameters of the synthesis program. The ultimate goal of this
process is to reach timing closure at the lowest achievable power. Quite often the default
values for the parameters are not ideal for a specific macro, which would benefit
instead from parameter customization. Fig. 1 also highlights three scenarios (A, B, and
C) along the Pareto frontier. These scenarios show the available tradeoffs between
timing closure and power reduction, e.g., point A closes timing with a 9% power
reduction, whereas point C improves timing by 55% with a 29% power reduction.
These points along the Pareto set provide a number of potential steps towards the
ultimate goal, depending on the additional techniques at the designer’s disposal
beyond parameter tuning. This example of a relatively simple macro underscores how
significantly the parameters settings can affect a design.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        The synthesis parameter tuning problem we address can be classified as a black-box
optimization problem, i.e., we treat the synthesis program as black-box software by
supplying input conditions (input data and parameter settings) and measuring the
output response in terms of synthesis quality of results (QoR). Black-box problems
are often approached using techniques from the field of simulation optimization [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
which is an umbrella term for optimization techniques that operate in the absence in
of an algebraic model of the system. Considering each macro exhibits a unique
inputoutput response to synthesis parameter settings and digital logic can take on an
intractable number of functionalities, the synthesis tool-flow of our focus is far too complex
to be modelled algebraically. Furthermore, our synthesis program often exhibits
nondeterministic behavior, which is also a characteristic of many simulation optimization
problems.
      </p>
      <p>Black-box optimization techniques can also be employed for design-space
exploration (DSE) purposes, however, unlike convention DSE, the goal of black-box
optimization is often to find one or more optimal or near-optimal design points
without necessarily requiring a complete exploration of the design space to determine the
whole Pareto frontier of design tradeoff points.</p>
      <p>
        Black-box optimization is a common problem seen across a number of fields, e.g.,
compiler tuning [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and software engineering [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. With respect to VLSI design, DSE
is becoming a more attractive solution for complex problems across various levels of
abstraction. At the architectural level, many DSE studies based on models or
simulators have been used to explore multi-objective design spaces, e.g., [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
Architecturallevel studies, however, typically do not result in implemented designs. DSE
approaches have also been applied in combination with high-level synthesis by a
number of researchers, e.g., [
        <xref ref-type="bibr" rid="ref6 ref7">6,7</xref>
        ]. FPGA synthesis parameter tuning has been reported in
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] using genetic algorithms and in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] using Bayesian optimization. However, prior
to our work [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], we know of no publications targeting automated parameter tuning
for logic and physical synthesis.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>System Architecture</title>
      <p>The framework of our parameter tuning system is shown in Fig. 2. SynTunSys
consists of a main tuning loop that constructs synthesis scenarios consisting of synthesis
parameter settings (Step (1)), submits and monitors synthesis jobs (2-3), analyzes the
results (4), and iteratively refines the solutions (5). A second background loop
archives the results of all runs from all macros, users, and projects, which can be mined
for historical trends across multiple macros and to provide feedback in terms of the
performance of synthesis parameters.</p>
      <p>The SysTunSys cost function conveys the optimization goals. It converts multiple
design metrics into a single cost number, allowing cost ranking of scenarios.
Examples of available metrics include: multiple timing metrics, power consumption,
congestion metrics, area utilization, electrical violations, runtime, etc. The selected
metrics are assigned weights to signify their relative importance. The overall cost
function is then a “normalized weighted sum” of the m selected metrics, expressed by
Equation (1) where Norm(Mi) is the normalized Mi across all the scenario results in a
SynTunSys run.
(1)</p>
      <p>
        The SynTunSys decision engine algorithms are key components of the system that
determine which scenarios should be run at each iteration, i.e., the decision engine
tackles the parameter tuning black-box optimization problem discussed previously.
The decision engine can also be upgraded independently and we constantly look to
improve these algorithms in terms of QoR prediction accuracy and compute
efficiency. Similar black-box tuning problems have been approached using a number of
techniques, e.g., machine learning [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], Markov decision processes [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and Bayesian
optimization [
        <xref ref-type="bibr" rid="ref12 ref9">9,12</xref>
        ]. During the development of SynTunSys we have explored
algorithms ranging from pseudo-genetic algorithms to on-line adaptive learning [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>SynTunSys Results</title>
      <p>
        SynTunSys was used during the design of the IBM z13 22nm server processor [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
The processor underwent two chip releases (tapeouts) over a multi-year design cycle,
during which SynTunSys was applied to macros over both releases. The chip consists
of a few hundred macros that average around 30K gates in size, with larger macros in
the 300K gate range. Prior IBM server processors have also used systematic
parameter tuning on a smaller scale. The IBM POWER7+ [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and POWER8 [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] processors
employed an earlier version of SynTunSys during the second chip releases, but
mainly for power reduction purposes. In the case of the z13 processor, however,
SynTunSys was employed during the entire design cycle for timing closure, power
reduction, and improving macro routability.
      </p>
      <p>Based on the efforts of a dedicated tuning team we were able to track SynTunSys
results on approximately 200 macros from the processor core. Table 1 shows the
average improvements achieved by SynTunSys over the best solution previously
achieved by the macro owners.</p>
      <p>Note that these results are based on the routed macro timing and power analysis; in
most cases the best known prior solutions included manual parameter tuning by the
macro owner. SynTunSys resulted in a 36% improvement in total negative slack, a
60% improvement in worst latch-to-latch slack (macro internal slack), and a 7%
power reduction. The actual values of the metrics, summed across all the macros,
underscores that the changes in the absolute numbers were significant, e.g., ~780,000
picoseconds of total negative slack was saved across ~200 macros.
5</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Trevillyan</surname>
          </string-name>
          , et al.,
          <article-title>“An Integrated Environment for Technology Closure of DeepSubmicron IC Designs,”</article-title>
          <source>IEEE Design &amp; Test of Computers</source>
          , vol.
          <volume>21</volume>
          :
          <issue>1</issue>
          , pp.
          <fpage>14</fpage>
          -
          <lpage>22</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Amaran</surname>
          </string-name>
          , et al., “Simulation Optimization:
          <article-title>A Review of Algorithms</article-title>
          and Applications,”
          <fpage>4OR</fpage>
          - A
          <source>Quarterly Journal of Operations Research</source>
          , Dec.
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Fursin</surname>
          </string-name>
          , et al.,
          <article-title>"Milepost GCC: Machine Learning Enabled Self-tuning Compiler,"</article-title>
          <source>International Journal Parallel Programming</source>
          ,
          <volume>39</volume>
          :
          <fpage>296</fpage>
          -
          <lpage>327</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Arcuri</surname>
          </string-name>
          , G. Fraser,
          <article-title>"Parameter Tuning or Default Values? An Empirical Investigation in Search-Based Software Engineering," Empirical Software Engineering</article-title>
          ,
          <year>June 2013</year>
          , Volume
          <volume>18</volume>
          , Issue 3.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>O.</given-names>
            <surname>Azizi</surname>
          </string-name>
          , et al.,
          <article-title>"An Integrated Framework for Joint Design Space Exploration of Microarchitecture and Circuits,"</article-title>
          <source>DATE</source>
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Xydis</surname>
          </string-name>
          , et al.,
          <article-title>"A Meta-Model Assisted Coprocessor Synthesis Framework for Compiler/Architecture Parameters Customization,"</article-title>
          <source>DATE</source>
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.-Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          and
          <string-name>
            <given-names>L. P.</given-names>
            <surname>Carloni</surname>
          </string-name>
          , “
          <article-title>On Learning-Based Methods for Design-Space Exploration with High-Level Synthesis</article-title>
          ,
          <string-name>
            <surname>” DAC</surname>
          </string-name>
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M. K.</given-names>
            <surname>Papamichael</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Milder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Hoe</surname>
          </string-name>
          ,
          <article-title>"Nautilus: Fast Automated IP Design Space Search Using Guided Genetic Algorithms,"</article-title>
          <source>DAC</source>
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>N.</given-names>
            <surname>Kapre</surname>
          </string-name>
          , et al.,
          <source>“Driving Timing Convergence of FPGA Designs through Machine Learning and Cloud Computing,” FCCM</source>
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>M. M. Ziegler</surname>
            , H.-Y. Liu,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Gristede</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Owens</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Nigaglioni</surname>
            ,
            <given-names>L. P.</given-names>
          </string-name>
          <string-name>
            <surname>Carloni</surname>
          </string-name>
          , “
          <article-title>A Synthesis-Parameter Tuning System for Autonomous Design-Space Exploration</article-title>
          ,
          <string-name>
            <surname>” DATE</surname>
          </string-name>
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Beltrame</surname>
          </string-name>
          et al.,
          <article-title>“Decision-Theoretic Design Space Exploration of Multiprocessor Platforms,”</article-title>
          <source>IEEE TCAD</source>
          ,
          <volume>29</volume>
          (
          <issue>7</issue>
          ):
          <fpage>1083</fpage>
          -
          <lpage>1095</lpage>
          ,
          <year>July 2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          et al.,
          <source>“Bayesian Optimization in High Dimensions via Random Embeddings,” Int'l Joint Conf. on Artificial Intelligence (IJCAI-13)</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>M. M. Ziegler</surname>
            , H.-Y. Liu,
            <given-names>L. P.</given-names>
          </string-name>
          <string-name>
            <surname>Carloni</surname>
          </string-name>
          , “
          <article-title>Scalable Auto-Tuning of Synthesis Parameters for Optimizing High-Performance Processors,”</article-title>
          <string-name>
            <surname>ISLPED</surname>
          </string-name>
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Warnock</surname>
          </string-name>
          , et al.,
          <article-title>“22nm Next-Generation IBM System z Microprocessor,”</article-title>
          <string-name>
            <surname>ISSCC</surname>
          </string-name>
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>M. M. Ziegler</surname>
            ,
            <given-names>G. D.</given-names>
          </string-name>
          <string-name>
            <surname>Gristede</surname>
            ,
            <given-names>V. V.</given-names>
          </string-name>
          <string-name>
            <surname>Zyuban</surname>
          </string-name>
          , “
          <article-title>Power Reduction by Aggressive Synthesis Design Space Exploration</article-title>
          ,
          <string-name>
            <surname>” ISLPED</surname>
          </string-name>
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>M. M. Ziegler</surname>
          </string-name>
          , et al.,
          <article-title>“POWER8 Design Methodology Innovations for Improving Productivity</article-title>
          and Reducing Power,
          <string-name>
            <surname>” CICC</surname>
          </string-name>
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>