<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Practical Comparison of High-Level Synthesis and Hardware Generation Frameworks: CPU Floating Point Unit Case</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Oleg</forename><surname>Morozov</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">ITMO University</orgName>
								<address>
									<addrLine>Kronverksky Pr. 49, bldg. A</addrLine>
									<postCode>197101</postCode>
									<settlement>Saint-Petersburg</settlement>
									<country key="RU">Russia</country>
								</address>
							</affiliation>
						</author>
						<author role="corresp">
							<persName><forename type="first">Alexander</forename><surname>Antonov</surname></persName>
							<email>antonov@itmo.ru</email>
							<affiliation key="aff0">
								<orgName type="institution">ITMO University</orgName>
								<address>
									<addrLine>Kronverksky Pr. 49, bldg. A</addrLine>
									<postCode>197101</postCode>
									<settlement>Saint-Petersburg</settlement>
									<country key="RU">Russia</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Practical Comparison of High-Level Synthesis and Hardware Generation Frameworks: CPU Floating Point Unit Case</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">D28AE24C7B970471C167FA3C115BFEA0</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T00:34+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>High-level synthesis</term>
					<term>hardware generation</term>
					<term>hardware microarchitecture</term>
					<term>floating-point unit</term>
					<term>RISC-V</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The research is devoted to analyzing and matching advantages and drawbacks of various high-level design environments for the components of modern CPU cores. In the paper, highlevel synthesis (HLS) and hardware generation frameworks (HGF) are compared for the case of floating-point execution unit (FPU). We use HGF-based FPU available in open-source SonicBOOM RISC-V CPU design from Berkeley as reference. Original HLS-based design of FPU module is proposed. This design is functionally equivalent to HGF-based one, but is described in behavioral (untimed) style, and its microarchitecture is optimized automatically by the HLS tool. The designed FPU has been synthesized in Vivado HLS and successfully tested in FPGA device. The research has shown that raising abstraction level up to behavioral one has provided the design with comparable frequency and resource characteristics, however, with significantly more concise design specification and automatic generation of microarchitecture. Based on these estimations, we envision HLS to be promising not only for accelerators that are external from CPUs, but also for selective, execution-centric components of modern CPUs themselves.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Hardware designing based on register-transfer level (RTL) and corresponding design languages (SystemVerilog, VHDL) has been dominant in industry in the last decades due to efficient abstraction from basic structural devices (gates, multiplexers, etc.), understandable concepts by a wide community of developers, and good support by the design tools. However, time-to-market, cost, and complexity restrictions are motivating exploration of approaches to improve the design process. These improvements include support of algorithmic specifications as design entry, automation of microarchitectural synthesis from high-level specifications and configurations, and ensuring scalability of designs to meet various performance, power, and area constraints.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Theoretical background 2.1. High-level synthesis and hardware generation approaches</head><p>High-level synthesis (HLS) and hardware generation frameworks (HGF) are two widely known approaches to improvement of hardware design process. Despite some common priorities (abstract specification, improving configurability, utilizing software experiences in hardware domain), these approaches differ significantly.</p><p>High-level synthesis is typically understood as automated synthesis of hardware structure from behavioral (algorithmic), untimed specifications, effectively forming a new distinct abstraction level <ref type="bibr" target="#b0">[1]</ref>. C/C++/SystemC programming languages are typically used as design entry. Microarchitectural synthesis is performed by the tool automatically, and, though it is directed to a certain extent via pragmas and constraints, design entry is abstracted from it. Majority of HLS tools perform a typical set of operations, including allocation of basic functional units, scheduling of operations regarding their dependencies and time constraints, and binding of these operations to allocated functional units. Optimizations are applied to programmatic models (such as Control and Data Flow Graph, CDFG). Shorter design cycle using behavioral synthesis allows many alternative circuit implementations to be explored, enlarging design space for better implementations.</p><p>Hardware generation frameworks improve RTL designing via exposing its abstractions (registers, modules, combinational circuits, etc.) to general-purpose programming environments. Typically, they are implemented as an embedded domain-specific language (eDSL), i.e. as a library. Unlike HLS, microarchitectural synthesis is not abstracted in design entry, but can be embedded in multiple custom generators. HGFs provide feature-rich environment for specification of RTL generation, offering programmatic construction of hardware, improving flexibility in defining and processing of configurations, layering new eDSLs, etc. Facilitation of programming generators instead of "fixed" designs enables deep adaptation of the hardware to the project needs and constraints. RTL-like models (such as FIRRTL) are typically used as intermediate representations for application of optimizations.</p><p>With their advantages and drawbacks, both HLS and HGF approaches have gained significant traction in academic and industrial designing. However, their typical application domains have some variations. Though HGF is more like a general-purpose approach (similar to generic RTL), it still requires digital design expertise from the designers. Also, the designers should be simultaneously programming experts and know the details of how RTL abstractions are embedded in certain HGF. HLS (ideally) does not require the designer to be a hardware expert, but targets acceleration coprocessors with static scheduling of operations and pipelined microarchitecture. As a result, HLS is not usually positioned for designing hardware units with custom and dynamic scheduling of computational process, including CPUs. Even simple, in-order implementations suffer from suboptimal performance, mostly because of conservative, static branch scheduling <ref type="bibr" target="#b1">[2]</ref>.</p><p>To adopt HLS for CPU-like hardware applications, the following strategies can be implemented <ref type="bibr" target="#b2">[3]</ref>.</p><p>Definition of microarchitecture explicitly in high-level language. To reflect dynamic scheduling mechanisms, they can be explicitly programmed in high-level language. For CPU applications, these mechanisms can include dynamic speculation, instruction reordering, data forwarding, stalling, etc. Though this approach does not impose restrictions on complexity of these mechanisms (custom ones can be freely included as well), this approach effectively lowers the design level, transforming behavioral approach into microarchitectural one. Expertise in hardware microarchitecture is required to implement this approach.</p><p>Allocation of statically scheduled structural units and designing them separately in high-level environment. Though this approach requires hardware microarchitecture expertise for allocation of these units and their integration, these units themselves can be extracted for abstract high-level definition of their behavior and automation of their optimization. For CPU applications, "computational" execution pipelines (integer, floating-point, DSP, custom ones) can hypothetically be good candidates for such extraction, since even in complex out-of-order microarchitectures operations are issued to such units when the data operands are ready, and the number of clock cycles needed does not depend on other CPU subsystems <ref type="bibr" target="#b3">[4]</ref>. In this paper, we explore the case of floating-point unit an important mathematical CPU block that was often implemented as external co-processor in the past, and now is typically a part of CPU die and can occupy more that 10% of chip area <ref type="bibr" target="#b4">[5]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">CPU floating point unit functionality</head><p>CPU floating-point unit (FPU) provides basic operations for numbers represented in floating-point format. The common format for single precision floating-point number is defined by IEEE-754 standard <ref type="bibr" target="#b5">[6]</ref>:</p><formula xml:id="formula_0">(−1) 𝑆 * 𝑀 * 2 𝐸 ,<label>(1)</label></formula><p>where S stands for sign, E is exponent, and M is mantissa. The binary IEEE-754 representation defines a 32-bit word, with one bit for sign, 8 bits for exponent, and 23 bits for mantissa. As a basic set of floating-point operations, we use those defined in RISC-V architecturea modern and open instruction set architecture being widely used both in academia and industry in recent years. An extension that includes floating-point operations on single precision numbers is denoted RV-F, which derives from the name of the "Float" data format. RISC-V uses 32 registers for floating-point numbers, denoted f0 -f31, with a size of 32 bits each. FPU works with both a separate floating-point register file and a common register file. Therefore, the module must accept and return data in both float and integer formats.</p><p>Table <ref type="table" target="#tab_0">1</ref> gives a summary of these operations. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Design of HGF-based FPU in BOOM</head><p>SonicBOOM is the third iteration of Berkeley Out-Of-Order Machine (BOOM) project. BOOM is a high-performance, synthesizable and parameterizable RV64GC RISC-V core, which means it supports multiplication and division extensions, atomic, single and double precision floating point operations, and short instructions. BOOM is currently one of the most complete and productive opensource RISC implementations and demonstrates the use of the main contemporary mechanisms such as superscalar processing of instructions, speculation, branch prediction, cache memory, etc. The core is designed based on Chisel hardware generation framework.</p><p>Chisel allows to flexibly construct class hierarchies of modules for various templates and communication mechanisms with the rest of the system (see Fig. <ref type="figure">1</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 1:</head><p>Class hierarchy for functional units of SonicBOOM RISC-V CPU <ref type="bibr" target="#b6">[7]</ref>.</p><p>In BOOM, execution of a floating-point instruction occurs in two different modules: fDiv/fSqrt for calculating the square root and division, and the FPU module that executes all other instructions. For simplicity, only FPUs without fDiv/fSqrt will be considered.</p><p>BOOM's FPU consists of for subblocks: sfma for single-precision operations, dfma for doubleprecision operations, fpiu for fp-to-int operations, and fpmu for fp-to-fp operations. Calculation algorithms are specified in "combinational" style and successively copied in register chains using Chisel's Pipe primitive with configurable delay. After EDA tool applies retiming, fully pipelined implementation with initiation interval of one clock cycle is obtained. To simplify write port processing, the delay is set to the same value for all subblocks.</p><p>BOOM uses interfaces and modules from the RocketChip processor core, which uses interfaces and modules from the Hardfloat core. Using SonicBOOM generator, FPU implementation has been generated and implemented for educational Digilent Nexys4-DDR board with Artix-7 FPGA device. We used Vivado 2020.2 for this task. Resulting characteristics have been compared to a similar implementation synthesized using Vivado HLS tool (see subsequent Sections).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Designing a FPU module with an HLS tool 4.1. Designed behavioral model of FPU</head><p>To compare the reference HGF-based design to HLS-generated one, functionally equivalent unit for HLS has been designed. According to HLS methodology, HLS-based design is a software function that specifies solely the behavior of the module and does not fixate its microarchitecture (see Fig. <ref type="figure">3</ref> The structure of the designed block is implemented as a branching function, where an operation is selected based on the func7 and func3 RISC-V instruction fields, as well as the value of the rs2 operand.</p><p>There The functions signbit, copysignf, fabsf, fpclassify, islessequal, isgreaterequal, isnan from the C library "math.h" were used. Compared to native C functions, the math.h library functions can reduce the use of LUT by 40%, FF by 50%, and achieve a higher clock speed by 62.5%. HLS-based implementation has also been synthesized to RTL, implemented and tested in hardware on Digilent Nexys4-DDR FPGA board.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Hardware test infrastructure</head><p>To provide interactive control, observation and debug capability for designed FPGA modules from PC programming environment, custom infrastructure has been used.</p><p>The key element in this infrastructure is UDM (UART-based Debug Module) FPGA module (see Fig. <ref type="figure" target="#fig_1">4</ref>). This module can initiate simple bus transactions in FPGA fabric under the control of PC program. UDM is managed via UART interface that is lightweight, easy to implement, and available in all FPGA boards. The protocol working between UDM and PC allows to initiate transactions and receive responses. This allows PC to "emulate" CPU host in custom system-on-chip designs. On PC, UDM is supported in Python 3 environment. Read or write function calls on PC become requests appearing on UDM system bus.  For HLS-based FPU, test several control and status registers (CSRs) have been allocated (see Table <ref type="table" target="#tab_3">2</ref>). These registers have been connected to the FPU and UDM system bus. Each test iteration sends the instruction number, the values of the operands, then starts the FPU and reads the error flags and the result values.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Comparison of HLS and HGF based implementations</head><p>Resulting characteristics for HGF-based and HLS-based implementations are shown in Table <ref type="table" target="#tab_4">3</ref>. It can be seen that the modules have the same initiation interval of one clock cycle, comparable frequency and resource characteristics.</p><p>HLS-based implementation is faster, but has bigger latency. According to our experiments, restricting maximum latency is impractical, since it is possible only with close to fold reduction of frequency. This makes absolute latency almost the same, but reduces bandwidth. Also, HLS-based implementation consumes less LUTs, but more flip-flops and DSP blocks. While DSP utilization (at the expense of general-purpose LUTs) is predictably better for high-level environment, more than two-fold consumption of DSPs requires additional investigation. Increased flip-flops consumption of HLS-based implementation is likely due to deeper pipelining.</p><p>When it comes to design specification mechanisms, for HLS, as well as for HGF, it is possible to set custom latency. In HLS this is possible through the use of pragmas, while in HGF it is done through explicit parameterization of the pipeline. Actually, reference HGF-based implementation heavily relies on retiming in lower-level RTL synthesis tool. In HLS, since pragma is a synthesizer directive, it is easier to change the computation schedule with this method, rather than directly adding parameters to the module structure. However, since the synthesis is carried out automatically by the tool, the desired result in HLS must be achieved heuristically.</p><p>To sum up, designing CPU execution units in high-level synthesis looks promising to implement high-level, easily extendable, scalable CPU projects, while preserving sufficient quality-of-results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Future work</head><p>In the future, the research is planned to develop in the following directions: 1. The designed HLS-based module is supposed to be integrated in Rocket and/or BOOM project and validated as part of actual RISC-V CPU; 2. In-depth exploration of the synthesized netlists in HGF and HLS projects and identification of the discrepancies in their structures; 3. Experimental explicit programming of floating-point computation algorithms in synthesizable C/C++ instead of relying on HLS tool to synthesize this logic; 4. Exploration of floating-point capabilities in alternative high-level tools, including opensource ones (LegUp <ref type="bibr" target="#b8">[9]</ref>, GAUT <ref type="bibr" target="#b9">[10]</ref>); 5. Exploration of feasibility of high-level synthesis tools for alternative CPU execution pipelines (integer, DSP, custom ones); 6. Exploration of high-level execution units design targeting ASIC devices.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Conclusion</head><p>Raising abstraction level, improving configurability of component base and adopting various design techniques from software domain is often considered inevitable in hardware designing to satisfy hardware project constraints at the moment and in the future. Despite the recent improvements in RTL design offered by hardware generation frameworks, design specification on behavioral level seems especially promising. However, this transition should be done with regard to quality of results, which may not be sufficient for the entire diversity of hardware.</p><p>Using the example of CPU floating-point execution unit, we are showing that comparable implementation results for selected elements of CPU can be achieved on behavioral level and using automatic synthesis of the unit's microarchitecture. This motivates further comparative exploration of configurability and efficiency of HGF and HLS environments for execution-related and other selected subsystems of modern CPUs, as well as other complex hardware projects.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 2 :</head><label>2</label><figDesc>Clock and reset signals are specified implicitly. The interface consists of two buses, the output ExeUnitResp and the input FpuReq. ValidIO is a built-in Chisel function that implements the creation of an interface with the valid enable signal and the specified bus type. The output interface resp has type ExeUnitResp, the standard interface for all BOOM function blocks. ExeUnitResp consists of a data bus and a ValidIO bus with flags. The flag bus is specified in the same execution unit file and consists of a MicroOp bus for transmitting service information and a flags for Floating Point exception flags from the RISC-V specification. The flags are part of the FCSR register. The Input interface req consists of the valid FpuReq interface. It has a MicroOp bus, three buses for transferring data from floating-point registers and one 5-bit bus for transferring the value of the exception flags. Generation of certain FPU implementation is controlled by 4 parameters: • minimum instruction length, • maximum instruction length, • arithmetic block latency based on SFMA operations, • arithmetic block latency based on DFMA operations. In Fig. 2, the configuration used for FPU implementation is shown. case class FPUParams( minFLen: Int = 32, fLen: Int = 64, … sfmaLatency: Int = 3, dfmaLatency: Int = 4) FPU configuration used for generation of implementation.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Infrastructure for interactive hardware testing of custom FPGA-based designs.</figDesc><graphic coords="6,124.25,268.88,345.90,131.35" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc></figDesc><table><row><cell cols="2">Floating-point operations defined in RISC-V architecture</cell></row><row><cell>Operation</cell><cell>Description</cell></row><row><cell>FADD, FSUB, FMUL, FMIN, FMAX</cell><cell>Arithmetic functions, input and output are float</cell></row><row><cell>FSGNJ, FSGNJN, FSGNJX</cell><cell>Sign-injection instructions, input and output are float</cell></row></table><note>FEQ, FLT, FLE, FCLASS Comparison operations, input is float, output is integer FCVT.W.S, FCVT.S.W, FCVT.WU.S, FCVT.S.WU Transfer operations from float to integer and vice versa. FMADD, FMSUB, FNMSUB, FNMADD Floating-point fused multiply-add instructions, input and output are float</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head></head><label></label><figDesc>). Behavioral FPU design for Vivado HLS (similar code fragments are omitted).</figDesc><table><row><cell>return_floats FPU(t_floats val){</cell></row><row><cell>return_floats val_out = inizialize();</cell></row><row><cell>if (val.funct3 == 0 &amp;&amp; val.funct7 == 0)</cell></row><row><cell>val_out.rd_f = val.rs1 + val.rs2;</cell></row><row><cell>else if (val.funct3 == 0 &amp;&amp; val.funct7 == 4)</cell></row><row><cell>val_out.rd_f = val.rs1 -val.rs2;</cell></row><row><cell>else if (val.funct3 == 0 &amp;&amp; val.funct7 == 8)</cell></row><row><cell>val_out.rd_f = val.rs1 * val.rs2;</cell></row><row><cell>else if (val.funct7 == 16)</cell></row><row><cell>val_out = FSGNJ_FSGNJN_FSGNJX(val, val_out);</cell></row><row><cell>...</cell></row><row><cell>val_out = FCVTWS_FCVTSW_FCVTWUS_FCVTSWU(val, val_out);</cell></row><row><cell>else if (val.funct3 == 1 &amp;&amp; val.funct7 == 112)</cell></row><row><cell>val_out = FCLASS(val, val_out);</cell></row><row><cell>else</cell></row><row><cell>val_out.err = 0;</cell></row><row><cell>if (isnan(val_out.rd_f) != 0)</cell></row><row><cell>val_out.nan = 1;</cell></row><row><cell>return (val_out);</cell></row><row><cell>}</cell></row><row><cell>Figure 3:</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 2</head><label>2</label><figDesc>CSRs allocated for FPU hardware testing.</figDesc><table><row><cell>Address</cell><cell>Mnemonic</cell><cell>Description</cell></row><row><cell>0x08</cell><cell>FPU_START</cell><cell>Enabling signal</cell></row><row><cell>0x0C</cell><cell>FUNC7</cell><cell>Unsigned 7-bits RISC-V instruction field</cell></row><row><cell>0x10</cell><cell>FUNC3</cell><cell>Unsigned 3-bits RISC-V instruction field</cell></row><row><cell>0x14 0x18</cell><cell>RS1 [31:0] RS1 [63:32]</cell><cell>First source floating-point register value</cell></row><row><cell>0x1С 0x20</cell><cell>RS2 [31:0] RS2 [63:32]</cell><cell>Second source floating-point register value</cell></row><row><cell>0x24 0x28</cell><cell>RS3 [31:0] RS3 [63:32]</cell><cell>Third source floating-point register value</cell></row><row><cell>0x2C</cell><cell>RSI</cell><cell>Source integer register value</cell></row><row><cell>0x30 0x34</cell><cell>RESULT [31:0] RESULT [63:32]</cell><cell>Result floating-point register value</cell></row><row><cell>0x38</cell><cell>RESULT_I</cell><cell>Result integer register value</cell></row><row><cell>0x3С</cell><cell>FLAG_NAN</cell><cell>Flag indicating that the result is NAN</cell></row><row><cell>0x40</cell><cell>FLAG_ERROR</cell><cell>Flag indicating that the function code is invalid</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 3</head><label>3</label><figDesc>Comparison of HGF and HLS based implementations.</figDesc><table><row><cell>Characteristics</cell><cell>HGF-based module (reference)</cell><cell>HLS-based module (designed)</cell></row><row><cell>Top frequency</cell><cell>92 Mhz</cell><cell>136 Mhz</cell></row><row><cell>Initiation interval</cell><cell>1 clock cycle</cell><cell>1 clock cycle</cell></row><row><cell>Latency</cell><cell>4 clock cycles</cell><cell>10 clock cycles</cell></row><row><cell>LUT</cell><cell>4738</cell><cell>3441</cell></row><row><cell>Flip-flops</cell><cell>1454</cell><cell>2929</cell></row><row><cell>DSP</cell><cell>11</cell><cell>26</cell></row><row><cell>Lines of code</cell><cell>230 (+1200 in HardFloat)</cell><cell>120</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8.">Acknowledgements</head><p>The work has been done in Software Engineering and Computer Systems Faculty of ITMO University. Design of hardware test infrastructure for interactive control, observation and debug of custom hardware modules based on FPGA devices (conducted by A. Antonov) has been supported by Russian Science Foundation, grant № 20-79-00219.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">High-Level Synthesis Blue Book</title>
		<author>
			<persName><forename type="first">M</forename><surname>Fingeroff</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2010">2010</date>
			<publisher>Xlibris Corporation</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Designing Customized ISA Processors using High Level Synthesis</title>
		<author>
			<persName><forename type="first">S</forename><surname>Skalicky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Ananthanarayana</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Lopez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lukowiak</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on ReConFigurable Computing and FPGAs (ReConFig)</title>
				<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="0" to="5" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Methods and Tools for Computer-Aided Synthesis of Processors Based on Microarchitectural Programmable Hardware Generators</title>
		<author>
			<persName><forename type="first">A</forename><surname>Antonov</surname></persName>
		</author>
		<ptr target="http://fppo.ifmo.ru/dissertation/?number=63419" />
		<imprint>
			<date type="published" when="2019-05-27">2019/05/27</date>
			<pubPlace>Saint-Petersburg</pubPlace>
		</imprint>
		<respStmt>
			<orgName>ITMO University</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Ph.D dissertation</note>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Modern Processor Design: Fundamentals of Superscalar Processors</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">H</forename><surname>Lipasti</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2013">2013</date>
			<publisher>Waveland Press</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">A Fully Pipelined Single-Precision Floating-Point Unit in the Synergistic Processor Element of a CELL Processor</title>
		<author>
			<persName><forename type="first">Hwa-Joon</forename><surname>Oh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Journal of Solid-State Circuits</title>
		<imprint>
			<biblScope unit="volume">41</biblScope>
			<biblScope unit="issue">4</biblScope>
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">IEEE Standard for Floating-Point Arithmetic</title>
	</analytic>
	<monogr>
		<title level="j">IEEE Std</title>
		<imprint>
			<biblScope unit="volume">754</biblScope>
			<biblScope unit="page" from="1" to="70" />
			<date type="published" when="2008">2008. 2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<ptr target="https://docs.boom-core.org/en/latest/sections/execution-stages.html" />
		<title level="m">RISCV-BOOM&apos;s documentation</title>
				<imprint>
			<date type="published" when="2020-11-14">2020/11/14</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Antonov</surname></persName>
		</author>
		<ptr target="https://github.com/AntonovAlexander/activecore" />
		<title level="m">ActiveCore</title>
				<imprint>
			<date type="published" when="2020-11-14">2020/11/14</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">LegUp: An open-source high-level synthesis tool for FPGA-based processor/accelerator systems</title>
		<author>
			<persName><forename type="first">A</forename><surname>Canis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Trans. Embed. Comput. Syst</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="issue">2</biblScope>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">GAUT: A High-Level Synthesis Tool for DSP Applications, From C algorithm to RTL architecture</title>
		<author>
			<persName><forename type="first">P</forename><surname>Coussy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Chavet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Bomel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Heller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Senn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Martin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">High-Level Synthesis</title>
				<meeting><address><addrLine>Netherlands</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="147" to="169" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
