<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Majorov International Conference on Software Engineering and Computer Systems, December</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Generation Frameworks: CPU Floating Point Unit Case</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Oleg Morozov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexander Antonov</string-name>
          <email>antonov@itmo.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>&amp; Saint Petersburg</institution>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>ITMO University</institution>
          ,
          <addr-line>Kronverksky Pr. 49, bldg. A, Saint-Petersburg, 197101</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <volume>1</volume>
      <fpage>0</fpage>
      <lpage>11</lpage>
      <abstract>
        <p>The research is devoted to analyzing and matching advantages and drawbacks of various high-level design environments for the components of modern CPU cores. In the paper, highlevel synthesis (HLS) and hardware generation frameworks (HGF) are compared for the case of floating-point execution unit (FPU). We use HGF-based FPU available in open-source SonicBOOM RISC-V CPU design from Berkeley as reference. Original HLS-based design of FPU module is proposed. This design is functionally equivalent to HGF-based one, but is described in behavioral (untimed) style, and its microarchitecture is optimized automatically by the HLS tool. The designed FPU has been synthesized in Vivado HLS and successfully tested in FPGA device. The research has shown that raising abstraction level up to behavioral one has provided the design with comparable frequency and resource characteristics, however, with significantly more concise design specification and automatic generation of microarchitecture. Based on these estimations, we envision HLS to be promising not only for accelerators that are external from components of modern CPUs themselves. High-level synthesis, hardware generation, hardware microarchitecture, floating-point unit,</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>RISC-V</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>Hardware designing based on register-transfer level (RTL) and corresponding design languages
(SystemVerilog, VHDL) has been dominant in industry in the last decades due to efficient abstraction
from
basic structural devices (gates, multiplexers, etc.), understandable concepts by a
wide
community of developers, and good support by the design tools. However, time-to-market, cost, and
complexity restrictions are motivating exploration of approaches to improve the design process. These
improvements include support of algorithmic specifications as design entry, automation
of
microarchitectural synthesis from
high-level specifications and
configurations, and ensuring
scalability of designs to meet various performance, power, and area constraints.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Theoretical background 2.1.</title>
    </sec>
    <sec id="sec-4">
      <title>High-level synthesis and hardware generation approaches</title>
      <p>High-level synthesis (HLS) and hardware generation frameworks (HGF) are two widely known
approaches to improvement of hardware design process. Despite some common priorities (abstract
specification, improving configurability, utilizing software experiences in hardware domain), these
approaches differ significantly.</p>
      <p>2020 Copyright for this paper by its authors.</p>
      <p>
        High-level synthesis is typically understood as automated synthesis of hardware structure from
behavioral (algorithmic), untimed specifications, effectively forming a new distinct abstraction level
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. C/C++/SystemC programming languages are typically used as design entry. Microarchitectural
synthesis is performed by the tool automatically, and, though it is directed to a certain extent via
pragmas and constraints, design entry is abstracted from it. Majority of HLS tools perform a typical
set of operations, including allocation of basic functional units, scheduling of operations regarding
their dependencies and time constraints, and binding of these operations to allocated functional units.
Optimizations are applied to programmatic models (such as Control and Data Flow Graph, CDFG).
Shorter design cycle using behavioral synthesis allows many alternative circuit implementations to be
explored, enlarging design space for better implementations.
      </p>
      <p>Hardware generation frameworks improve RTL designing via exposing its abstractions (registers,
modules, combinational circuits, etc.) to general-purpose programming environments. Typically, they
are implemented as an embedded domain-specific language (eDSL), i.e. as a library. Unlike HLS,
microarchitectural synthesis is not abstracted in design entry, but can be embedded in multiple custom
generators. HGFs provide feature-rich environment for specification of RTL generation, offering
programmatic construction of hardware, improving flexibility in defining and processing of
configurations, layering new eDSLs, etc. Facilitation of programming generators instead of “fixed”
designs enables deep adaptation of the hardware to the project needs and constraints. RTL-like
models (such as FIRRTL) are typically used as intermediate representations for application of
optimizations.</p>
      <p>
        With their advantages and drawbacks, both HLS and HGF approaches have gained significant
traction in academic and industrial designing. However, their typical application domains have some
variations. Though HGF is more like a general-purpose approach (similar to generic RTL), it still
requires digital design expertise from the designers. Also, the designers should be simultaneously
programming experts and know the details of how RTL abstractions are embedded in certain HGF.
HLS (ideally) does not require the designer to be a hardware expert, but targets acceleration
coprocessors with static scheduling of operations and pipelined microarchitecture. As a result, HLS is
not usually positioned for designing hardware units with custom and dynamic scheduling of
computational process, including CPUs. Even simple, in-order implementations suffer from
suboptimal performance, mostly because of conservative, static branch scheduling [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        To adopt HLS for CPU-like hardware applications, the following strategies can be implemented
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>Definition of microarchitecture explicitly in high-level language. To reflect dynamic
scheduling mechanisms, they can be explicitly programmed in high-level language. For CPU
applications, these mechanisms can include dynamic speculation, instruction reordering, data
forwarding, stalling, etc. Though this approach does not impose restrictions on complexity of these
mechanisms (custom ones can be freely included as well), this approach effectively lowers the design
level, transforming behavioral approach into microarchitectural one. Expertise in hardware
microarchitecture is required to implement this approach.</p>
      <p>
        Allocation of statically scheduled structural units and designing them separately in high-level
environment. Though this approach requires hardware microarchitecture expertise for allocation of
these units and their integration, these units themselves can be extracted for abstract high-level
definition of their behavior and automation of their optimization. For CPU applications,
“computational” execution pipelines (integer, floating-point, DSP, custom ones) can hypothetically be
good candidates for such extraction, since even in complex out-of-order microarchitectures operations
are issued to such units when the data operands are ready, and the number of clock cycles needed
does not depend on other CPU subsystems [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In this paper, we explore the case of floating-point unit
– an important mathematical CPU block that was often implemented as external co-processor in the
past, and now is typically a part of CPU die and can occupy more that 10% of chip area [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
    </sec>
    <sec id="sec-5">
      <title>CPU floating point unit functionality</title>
      <p>
        CPU floating-point unit (FPU) provides basic operations for numbers represented in floating-point
format. The common format for single precision floating-point number is defined by IEEE-754
standard [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]:
      </p>
      <p>(−1) ∗  ∗ 2 , (1)
where S stands for sign, E is exponent, and M is mantissa. The binary IEEE-754 representation
defines a 32-bit word, with one bit for sign, 8 bits for exponent, and 23 bits for mantissa. As a basic
set of floating-point operations, we use those defined in RISC-V architecture – a modern and open
instruction set architecture being widely used both in academia and industry in recent years. An
extension that includes floating-point operations on single precision numbers is denoted RV-F, which
derives from the name of the "Float" data format. RISC-V uses 32 registers for floating-point
numbers, denoted f0 – f31, with a size of 32 bits each. FPU works with both a separate floating-point
register file and a common register file. Therefore, the module must accept and return data in both
float and integer formats.</p>
      <p>Table 1 gives a summary of these operations.</p>
    </sec>
    <sec id="sec-6">
      <title>3. Design of HGF-based FPU in BOOM</title>
      <p>SonicBOOM is the third iteration of Berkeley Out-Of-Order Machine (BOOM) project. BOOM is
a high-performance, synthesizable and parameterizable RV64GC RISC-V core, which means it
supports multiplication and division extensions, atomic, single and double precision floating point
operations, and short instructions. BOOM is currently one of the most complete and productive
opensource RISC implementations and demonstrates the use of the main contemporary mechanisms such
as superscalar processing of instructions, speculation, branch prediction, cache memory, etc. The core
is designed based on Chisel hardware generation framework.</p>
      <p>Chisel allows to flexibly construct class hierarchies of modules for various templates and
communication mechanisms with the rest of the system (see Fig. 1).</p>
      <p>In BOOM, execution of a floating-point instruction occurs in two different modules: fDiv/fSqrt for
calculating the square root and division, and the FPU module that executes all other instructions. For
simplicity, only FPUs without fDiv/fSqrt will be considered.</p>
      <p>BOOM’s FPU consists of for subblocks: sfma for single-precision operations, dfma for
doubleprecision operations, fpiu for fp-to-int operations, and fpmu for fp-to-fp operations. Calculation
algorithms are specified in “combinational” style and successively copied in register chains using
Chisel’s Pipe primitive with configurable delay. After EDA tool applies retiming, fully pipelined
implementation with initiation interval of one clock cycle is obtained. To simplify write port
processing, the delay is set to the same value for all subblocks.</p>
      <p>BOOM uses interfaces and modules from the RocketChip processor core, which uses interfaces
and modules from the Hardfloat core. Clock and reset signals are specified implicitly.</p>
      <p>The interface consists of two buses, the output ExeUnitResp and the input FpuReq. ValidIO is a
built-in Chisel function that implements the creation of an interface with the valid enable signal and
the specified bus type. The output interface resp has type ExeUnitResp, the standard interface for all
BOOM function blocks. ExeUnitResp consists of a data bus and a ValidIO bus with flags. The flag
bus is specified in the same execution unit file and consists of a MicroOp bus for transmitting service
information and a flags for Floating Point exception flags from the RISC-V specification. The flags
are part of the FCSR register.</p>
      <p>The Input interface req consists of the valid FpuReq interface. It has a MicroOp bus, three buses
for transferring data from floating-point registers and one 5-bit bus for transferring the value of the
exception flags.</p>
      <p>Generation of certain FPU implementation is controlled by 4 parameters:
• minimum instruction length,
• maximum instruction length,
• arithmetic block latency based on SFMA operations,
• arithmetic block latency based on DFMA operations.</p>
      <p>In Fig. 2, the configuration used for FPU implementation is shown.</p>
      <p>Using SonicBOOM generator, FPU implementation has been generated and implemented for
educational Digilent Nexys4-DDR board with Artix-7 FPGA device. We used Vivado 2020.2 for this
task. Resulting characteristics have been compared to a similar implementation synthesized using
Vivado HLS tool (see subsequent Sections).</p>
    </sec>
    <sec id="sec-7">
      <title>4. Designing a FPU module with an HLS tool</title>
    </sec>
    <sec id="sec-8">
      <title>4.1. Designed behavioral model of FPU</title>
      <p>To compare the reference HGF-based design to HLS-generated one, functionally equivalent unit
for HLS has been designed. According to HLS methodology, HLS-based design is a software function
that specifies solely the behavior of the module and does not fixate its microarchitecture (see Fig. 3).
return_floats FPU(t_floats val){
return_floats val_out = inizialize();
if (val.funct3 == 0 &amp;&amp; val.funct7 == 0)</p>
      <p>val_out.rd_f = val.rs1 + val.rs2;
else if (val.funct3 == 0 &amp;&amp; val.funct7 == 4)</p>
      <p>val_out.rd_f = val.rs1 - val.rs2;
else if (val.funct3 == 0 &amp;&amp; val.funct7 == 8)</p>
      <p>val_out.rd_f = val.rs1 * val.rs2;
else if (val.funct7 == 16)</p>
      <p>val_out = FSGNJ_FSGNJN_FSGNJX(val, val_out);
...</p>
      <p>val_out = FCVTWS_FCVTSW_FCVTWUS_FCVTSWU(val, val_out);
else if (val.funct3 == 1 &amp;&amp; val.funct7 == 112)</p>
      <p>val_out = FCLASS(val, val_out);
else</p>
      <p>val_out.err = 0;
if (isnan(val_out.rd_f) != 0)</p>
      <p>val_out.nan = 1;
return (val_out);
}
Figure 3: Behavioral FPU design for Vivado HLS (similar code fragments are omitted).</p>
      <p>The structure of the designed block is implemented as a branching function, where an operation is
selected based on the func7 and func3 RISC-V instruction fields, as well as the value of the rs2
operand.</p>
      <p>There are four sub-functions: calculating the equality operation FEQ, branching for sign change
operations FSGNJ/FSGNJN/FSGNJX, format change operations
FCVTWS/FCVTSW/FCVTWUS/FCVTSWU and defining the type of variable FCLASS.</p>
      <p>Input and output signals are specified as structures. The input structure includes:
• floating-point operands
• integer operand
• funct7 and funct3 RISC-V instruction fields
The output structure includes:
• floating-point result
• integer result
• instruction error flag
• NaN flag</p>
      <p>The functions signbit, copysignf, fabsf, fpclassify, islessequal, isgreaterequal, isnan from the C
library “math.h” were used. Compared to native C functions, the math.h library functions can reduce
the use of LUT by 40%, FF by 50%, and achieve a higher clock speed by 62.5%.</p>
      <p>HLS-based implementation has also been synthesized to RTL, implemented and tested in hardware
on Digilent Nexys4-DDR FPGA board.
4.2.</p>
    </sec>
    <sec id="sec-9">
      <title>Hardware test infrastructure</title>
      <p>To provide interactive control, observation and debug capability for designed FPGA modules from
PC programming environment, custom infrastructure has been used.</p>
      <p>The key element in this infrastructure is UDM (UART-based Debug Module) FPGA module (see
Fig. 4). This module can initiate simple bus transactions in FPGA fabric under the control of PC
program. UDM is managed via UART interface that is lightweight, easy to implement, and available
in all FPGA boards. The protocol working between UDM and PC allows to initiate transactions and
receive responses. This allows PC to “emulate” CPU host in custom system-on-chip designs. On PC,
UDM is supported in Python 3 environment. Read or write function calls on PC become requests
appearing on UDM system bus.</p>
      <p>UDM module consumes minimum amount of hardware resources (&lt;1% of LUTs and flip-flops on
Artix-7 FPGA device), can be implemented in minutes, and requires minimum setup (restricted to
COM port number definition).</p>
      <p>For HLS-based FPU, test several control and status registers (CSRs) have been allocated (see
Table 2). These registers have been connected to the FPU and UDM system bus. Each test iteration
sends the instruction number, the values of the operands, then starts the FPU and reads the error flags
and the result values.</p>
    </sec>
    <sec id="sec-10">
      <title>5. Comparison of HLS and HGF based implementations</title>
      <p>Resulting characteristics for HGF-based and HLS-based implementations are shown in Table 3.</p>
      <p>It can be seen that the modules have the same initiation interval of one clock cycle, comparable
frequency and resource characteristics.</p>
      <p>HLS-based implementation is faster, but has bigger latency. According to our experiments,
restricting maximum latency is impractical, since it is possible only with close to fold reduction of
frequency. This makes absolute latency almost the same, but reduces bandwidth.</p>
      <p>Also, HLS-based implementation consumes less LUTs, but more flip-flops and DSP blocks. While
DSP utilization (at the expense of general-purpose LUTs) is predictably better for high-level
environment, more than two-fold consumption of DSPs requires additional investigation. Increased
flip-flops consumption of HLS-based implementation is likely due to deeper pipelining.</p>
      <p>When it comes to design specification mechanisms, for HLS, as well as for HGF, it is possible to
set custom latency. In HLS this is possible through the use of pragmas, while in HGF it is done
through explicit parameterization of the pipeline. Actually, reference HGF-based implementation
heavily relies on retiming in lower-level RTL synthesis tool. In HLS, since pragma is a synthesizer
directive, it is easier to change the computation schedule with this method, rather than directly adding
parameters to the module structure. However, since the synthesis is carried out automatically by the
tool, the desired result in HLS must be achieved heuristically.</p>
      <p>To sum up, designing CPU execution units in high-level synthesis looks promising to implement
high-level, easily extendable, scalable CPU projects, while preserving sufficient quality-of-results.</p>
    </sec>
    <sec id="sec-11">
      <title>6. Future work</title>
      <p>
        In the future, the research is planned to develop in the following directions:
1. The designed HLS-based module is supposed to be integrated in Rocket and/or BOOM
project and validated as part of actual RISC-V CPU;
2. In-depth exploration of the synthesized netlists in HGF and HLS projects and identification of
the discrepancies in their structures;
3. Experimental explicit programming of floating-point computation algorithms in synthesizable
C/C++ instead of relying on HLS tool to synthesize this logic;
4. Exploration of floating-point capabilities in alternative high-level tools, including
opensource ones (LegUp [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], GAUT [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]);
5. Exploration of feasibility of high-level synthesis tools for alternative CPU execution pipelines
(integer, DSP, custom ones);
6. Exploration of high-level execution units design targeting ASIC devices.
      </p>
    </sec>
    <sec id="sec-12">
      <title>7. Conclusion</title>
      <p>Raising abstraction level, improving configurability of component base and adopting various
design techniques from software domain is often considered inevitable in hardware designing to
satisfy hardware project constraints at the moment and in the future. Despite the recent improvements
in RTL design offered by hardware generation frameworks, design specification on behavioral level
seems especially promising. However, this transition should be done with regard to quality of results,
which may not be sufficient for the entire diversity of hardware.</p>
      <p>Using the example of CPU floating-point execution unit, we are showing that comparable
implementation results for selected elements of CPU can be achieved on behavioral level and using
automatic synthesis of the unit’s microarchitecture. This motivates further comparative exploration of
configurability and efficiency of HGF and HLS environments for execution-related and other selected
subsystems of modern CPUs, as well as other complex hardware projects.</p>
    </sec>
    <sec id="sec-13">
      <title>8. Acknowledgements</title>
      <p>The work has been done in Software Engineering and Computer Systems Faculty of ITMO
University. Design of hardware test infrastructure for interactive control, observation and debug of
custom hardware modules based on FPGA devices (conducted by A. Antonov) has been supported by
Russian Science Foundation, grant № 20-79-00219.</p>
    </sec>
    <sec id="sec-14">
      <title>9. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fingeroff</surname>
          </string-name>
          ,
          <string-name>
            <surname>High-Level Synthesis Blue Book. Xlibris Corporation</surname>
          </string-name>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Skalicky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ananthanarayana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lopez</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Lukowiak</surname>
          </string-name>
          ,
          <article-title>Designing Customized ISA Processors using High Level Synthesis</article-title>
          .
          <source>In: International Conference on ReConFigurable Computing and FPGAs (ReConFig)</source>
          , pp.
          <fpage>0</fpage>
          -
          <lpage>5</lpage>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Antonov</surname>
          </string-name>
          ,
          <article-title>Methods and Tools for Computer-Aided Synthesis of Processors Based on Microarchitectural Programmable Hardware Generators</article-title>
          ,
          <source>Ph.D dissertation</source>
          , ITMO University, Saint-Petersburg, http://fppo.ifmo.ru/dissertation/?number=63419, last accessed
          <year>2019</year>
          /05/27.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.P.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.H.</given-names>
            <surname>Lipasti</surname>
          </string-name>
          , Modern Processor Design:
          <article-title>Fundamentals of Superscalar Processors</article-title>
          . Waveland Press (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Hwa-Joon Oh</surname>
            , et al.,
            <given-names>A Fully</given-names>
          </string-name>
          <string-name>
            <surname>Pipelined Single-Precision</surname>
          </string-name>
          Floating
          <article-title>-Point Unit in the Synergistic Processor Element of a CELL Processor</article-title>
          .
          <source>IEEE Journal of Solid-State Circuits</source>
          , Vol.
          <volume>41</volume>
          , No.
          <volume>4</volume>
          (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <article-title>[6] IEEE Standard for Floating-Point Arithmetic</article-title>
          .
          <source>IEEE Std 754-2008</source>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>70</lpage>
          (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <article-title>[7] RISCV-BOOM's documentation</article-title>
          , URL: https://docs.boom-core.org/en/latest/sections/executionstages.html,
          <source>last accessed</source>
          <year>2020</year>
          /11/14.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Antonov</surname>
          </string-name>
          , ActiveCore, URL: https://github.com/AntonovAlexander/activecore, last accessed
          <year>2020</year>
          /11/14.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Canis</surname>
          </string-name>
          , et al.,
          <article-title>LegUp: An open-source high-level synthesis tool for FPGA-based processor/accelerator systems</article-title>
          .
          <source>In: Trans. Embed. Comput. Syst.</source>
          , vol.
          <volume>13</volume>
          , no.
          <issue>2</issue>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Coussy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chavet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bomel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Heller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Senn</surname>
          </string-name>
          , E. Martin,
          <article-title>GAUT: A High-Level Synthesis Tool for DSP Applications, From C algorithm to RTL architecture</article-title>
          .
          <source>In: High-Level Synthesis</source>
          , pp.
          <fpage>147</fpage>
          -
          <lpage>169</lpage>
          , Eds. Springer Netherlands (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>