1. Introduction

An Accuracy-Controllable Approximate Adder for FPGAs

Masaki Sano

Hiroki Nishikawa

Xiangbo Kong

Hiroyuki Tomiyama

Tongxin Yang

Tomoaki Ukezono

Toshinori Sato

0 0 Department of Electronics Engineering and Computer Science, Fukuoka University , Fukuoka , Japan 1 Graduate School of Information Science and Technology, Osaka University , Osaka , Japan 2 Graduate School of Science and Engineering, Ritsumeikan University , Shiga , Japan 3 Sony Semiconductor Solutions Corporation , Kanagawa , Japan

In this paper, we propose an accuracy-controllable approximate adder for FPGAs. The proposed adder has a special input to dynamically change the accuracy in addition to two operands. When accurate computation is required, the adder computes accurately. On the other hand, when accurate computation is not required, the adder computes inaccurately but quickly at low power. The important feature of our adder is that it utilizes carry-chain modules which are built in FPGAs. By using the carry chains, our approximate adder computes much faster at lower power than an existing approximate adder.

1 approximate computing approximate adder FPGA

1. Introduction

In recent years, approximate computing technology has attracted attention to achieve higher performance and lower power consumption by tolerating a certain degree of computational error. Approximate computing techniques are used especially in the fields of image processing and machine learning since they are computationally expensive and error-tolerant to some extent [ 1 ][ 2 ].

Research on approximate computing circuits has been conducted at various design levels from the transistor to the architecture levels [ 3 ][ 4 ]. This paper focuses on approximate adders since addition is one of the most fundamental arithmetic operations. There are many studies on approximate adders which improve power-performance efficiency by disconnecting carry propagation [ 5 ]-[ 9 ]. The work in [ 10 ] proposes an approach to accurately calculate errors of approximate adders, and the work in [ 11 ] provides a detailed analysis of the trade-off between accuracy and resource efficiency. The authors of [ 12 ][ 13 ] state that circuit design with variable computational accuracy is desirable for designing systems that meet diverse requirements. An approximate adder circuit, named carry-maskable adder (CMA), that can change the computational accuracy is proposed in [ 14 ]. Since CMA enables dynamic control of accuracy, it is possible to perform approximate operations within an error tolerable range.

Based on CMA in [ 14 ], in this paper, we propose an accuracy-controllable approximate adder for FPGAs. If the original CMA is implemented in FPGAs in a straightforward manner, lookup tables (LUTs) are connected in series, and hence, the delay and power increase significantly. Our proposed adder, named carry-chain based carry-maskable adder (CC-CMA), takes advantage of fast carry-chain modules which are built in FPGAs.

This paper is organized as follows. Section 2 presents CC-CMA, and Section 3 analyzes the hardware cost, delay, computational error and power consumption of CC-CMA. Finally, Section 4 summarizes this paper and discusses future work.

This work was done while the author was with Fukuoka University, Japan.

2. Carry-Chain based Carry-Maskable Adder

In this section, we first explain the carry-maskable adder [ 14 ], and then, we propose a carry-chain based carry-maskable adder for FPGAs. 2.1.

Carry-Maskable Adder

This work is based on an approximate adder, named carry-maskable adder (CMA), which was originally proposed in [ 14 ]. CMA can dynamically change the accuracy level according to the special input signal. Figure 1 (a) shows the diagram of an 8-bit CMA. The 8-bit CMA consists of a carrymaskable half adder (CMHA) and seven carry-maskable full adders (CMFAs) connected in series, as shown in Figure 1 (b). In addition to three inputs x, y and carry-in denoted as cin, CMFA has a special input named a mask to dynamically control the accuracy. If the mask is 0, CMFA performs exact addition. If the mask is 1, the carry-out signal Cout is 0, and the sum s is the logical sum of a and b assuming that the carry-in signal from the lower bit is 0. This computation is not accurate but approximate. However, since carry signals are not propagated from lower bits to upper bits, the delay and power consumption are reduced. By setting masks of lower bits to 1 and those of upper bits to 0, the upper (more significant) bits are computed accurately, and the lower (less significant) bits are computed approximately. Thus, by controlling mask, we can explore the trade-off between accuracy, delay, and power consumption, depending on the requirement of the applications.

(a) 8-bit CMA

(b) Carry-maskable full adder

Figure 1. Carry-maskable adder [ 14 ] 2.2.

Carry-Chain Based CMA

Originally, CMA was designed for ASICs, and how to implement CMA on FPGAs is not presented or discussed in [ 14 ] despite the widespread use of FPGAs. If the Boolean expressions of CMA are given to FPGA synthesis tools, each CMFA is mapped to one or two lookup tables (LUTs), and the LUTs are connected in series, as shown in Figure 1 (a). This implementation is not efficient in terms of delay and power consumption since LUTs are slow and power-consuming.

Recent FPGAs are equipped with carry-chain modules for fast addition. Figure 2 shows a schematic diagram of an accurate 4-bit adder using a built-in carry chain module. The carry-chain module consists of four multiplexers and four EXOR gates, and is connected from four LUTs. For accurate addition, each LUT is configured to compute EXOR of x and y as follows.

As seen in Figure 2, carry signals go through the carry-chain module. In other words, the carry signals do not go through LUTs which are slow and power-consuming. Thanks to the built-in carrychain module, addition is computed fast at low power.

In this work, we take advantage of the carry-chain modules in the design of approximate adders. Figure 3 shows our proposed approximate adder, named carry-chain based carry-maskable adder (CCCMA. LUTs labeled P0-P3 and G0-G3 compute Equations (2) and (3), respectively.

Similar to the accurate adder shown in Figure 2, carry signals in CC-CMA do not go through LUTs but go through fast carry-chain modules when mask is 0. When mask is 1, carry signals do not propagate to upper bits, which achieves faster and lower-power computation than the accurate adder at the expense of computational inaccuracy.

3. Evaluation

We have designed 32-bit and 64-bit CC-CMAs in Verilog-HDL, and synthesized them for Xilinx Artix-7 device with Xilinx Vivado 2019.2. For comparison, we have also designed accurate adders and original CMAs [ 14 ] in Verilog-HDL. The synthesized accurate adders utilize built-in carry-chain modules, as shown in Figure 2, while the synthesized CMAs do not. The three adders are compared in terms of hardware resources, delay, power consumption, maximum error and average error. Delay, average error and power consumption are obtained by post-synthesis simulation using the Vivado toolkit. For 32-bit and 64-bit adders, 100,000 and 1,000,000 random simulations are performed, respectively. Delay, error and power consumption of CMA and CC-CMA depend on the values of masks. Recall that CMA and CC-CMA compute accurately when masks of all bits are 0. When the masks of the least significant n-bits are set to 1, the lower n-bits are added approximately, and the upper bits are added accurately. In our experiments, we vary the number of lower bits whose masks are set to 1.

Hardware Resources

Table 1 compares hardware resources in terms of the number of 6-input LUTs. The original CMAs use twice as many LUTs as the accurate adders and CC-CMAs. CC-CMAs are as small as the accurate adders, although the functionality of CC-CMAs is more complex.

32-bit adders 64-bit adders

Power Consumption

(b) 64-bit adders

Figure 5. Power consumption (µW) leading to lower power consumption. It is also observed in Figure 5 that the power consumption of CMA and CC-CMA decreases as more bits are masked and approximated. 3.4.

Computational Errors

So far, we have seen that CC-CMA computes faster at lower power than the accurate adders. These advantages come at the cost of computational error. Table 2 shows the degree of computational errors of CMA and CC-CMA. Recall that the functions of CMA and CC-CMA are exactly the same, and therefore, the amounts of errors of the two adders are the same. random simulation. As more bits are masked, the computational error increases.

mask max average mask max average mask max average (a) 64-bit adders 16 48 20 52 24 56 28 60 32 64 3.5.

Trade-off between Error, Delay and Power

From Figure 4 and Table 2, the trade-off between delay and computational error for 32-bit CC-CMA can be derived as shown in Figure 6. The figure shows that the delay can be shortened at the cost of computational error, but the cost is not low. Also, it is not beneficial to dynamically change the delay of adders unless the adders exist on the critical path of the entire circuits.

From Figure 5 and Table 2, the trade-off between power consumption and computational error for 32-bit CC-CMA can be derived as shown in Figure 7. A significant amount of power can be saved at the expense of computational error.

4. Conclusions

In this paper, we have proposed an approximate adder named carry-chain based carry maskable adder (CC-CMA) for FPGAs. CC-CMA has a special input signal named a mask to dynamically control the computational accuracy. CC-CMA takes advantage of fast carry-chain modules which are equipped in modern FPGAs. By using the built-in carry chains, CC-CMA computes fast at low power. The experimental results demonstrate the efficiency of CC-CMA compared with an accurate adder and an existing carry-maskable adder. In future, we plan to evaluate CC-CMA using real-world applications. Also, we plan to develop accuracy-controllable multipliers for FPGAs based on CC-CMA.

Acknowledgments References

This work is supported partly by KAKENHI 20H00590, 19H04081 and 21K19776.

[1]

Esmaeilzadeh ,

Sampson ,

Ceze and D. Neural acceleration for general-purpose approximate programs , IEEE/ACM International Symposium on Microarchitecture, 2012 .

[2]

Gupta ,

Mohapatra ,

S. P.

Park ,

Raghunathan and

Roy IMPACT : IMPrecise adders for low-power approximate computing , IEEE/ACM International Symposium on Low Power Electronics and Design , 2011 .

[3] S. survey of techniques for approximate c ACM Computing Surveys , 2016 .

[4]

Xu ,

Mytkowicz , and N. computing: A s IEEE Design & Test , 2016 .

[5]

Gollu and P.

carry maskable adder using modified full s

Journal of Physics Conference Series , vol. 1921 , no. 1 , article. 012049 , 2021 .

[6]

P. K.

Sujit , G. Bharat, and R. K. A power and area efficient approximate carry skip adder for error-resilient applications , Turkish Journal of Electrical Engineering and Computer Sciences , vol. 28 , no. 1 , pp. 443 - 457 , 2019 .

[7]

Babita ,

Vishesh , and S. EFCSA : An efficient carry speculative approximate adder with rectification , IEEE 23rd International Symposium on Quality Electronic Design , 2022 .

[8]

Jungwon ,

Hyoju ,

Yerin , and K. Approximate adder design with simplified lower-part approximation , IEICE Electronics Express , vol. 17 , no. 15 , pp. 1 - 3 , 2020 .

[9]

Kanani ,

Mehta and N. ACA-CSU: A carry selection based accuracy configurable approximate adder design , IEEE Computer Society Annual Symposium on VLSI , 2020 .

[10]

Rezaalipour ,

Dehyadegari and M. N. AxMAP: Making approximate adders aware of input patterns , IEEE Transactions on Computers , vol. 69 , no. 6 , pp. 868 - 882 , 2020 .

[11]

Catelan ,

Santos and

Duenha , Accuracy and physical characterization of approximate arithmetic circuits, XXI Simpósio em Sistemas Computacionais de Alto Desempenho, 2020 .

[12]

Venkataramani ,

V. K.

Chippa ,

S. T.

Chakradhar ,

Roy , and A. programmable vector processors for approximate computing , International Symposium on Microarchitecture , 2013 .

[13]

A. B.

Kahng and S. -configurable adder for approximate arithmetic designs , Design Automation Conference , 2012 .

[14]

Yang ,

Ukezono and T.

An accuracy-configurable adder for low-power applications

IEICE Trans. on Electronics , vol. E103-C, no. 3 , pp. 68 - 76 , 2020 .