=Paper=
{{Paper
|id=Vol-3198/regular8
|storemode=property
|title=An Accuracy-Controllable Approximate Adder for FPGAs
|pdfUrl=https://ceur-ws.org/Vol-3198/paper8.pdf
|volume=Vol-3198
|authors=Masaki Sano,Hiroki Nishikawa,Xiangbo Kong,Hiroyuki Tomiyama,Tongxin Yang,Tomoaki Ukezono,Toshinori Sato
|dblpUrl=https://dblp.org/rec/conf/atait/SanoNKTYUS22
}}
==An Accuracy-Controllable Approximate Adder for FPGAs==
An Accuracy-Controllable Approximate Adder for FPGAs Masaki Sano 1, Hiroki Nishikawa 2, Xiangbo Kong 1, Hiroyuki Tomiyama 1, Tongxin Yang 3 , Tomoaki Ukezono4, and Toshinori Sato4 1 Graduate School of Science and Engineering, Ritsumeikan University, Shiga, Japan 2 Graduate School of Information Science and Technology, Osaka University, Osaka, Japan 3 Sony Semiconductor Solutions Corporation, Kanagawa, Japan 4 Department of Electronics Engineering and Computer Science, Fukuoka University, Fukuoka, Japan Abstract In this paper, we propose an accuracy-controllable approximate adder for FPGAs. The proposed adder has a special input to dynamically change the accuracy in addition to two operands. When accurate computation is required, the adder computes accurately. On the other hand, when accurate computation is not required, the adder computes inaccurately but quickly at low power. The important feature of our adder is that it utilizes carry-chain modules which are built in FPGAs. By using the carry chains, our approximate adder computes much faster at lower power than an existing approximate adder. Keywords 1 approximate computing, approximate adder, FPGA 1. Introduction In recent years, approximate computing technology has attracted attention to achieve higher performance and lower power consumption by tolerating a certain degree of computational error. Approximate computing techniques are used especially in the fields of image processing and machine learning since they are computationally expensive and error-tolerant to some extent [1][2]. Research on approximate computing circuits has been conducted at various design levels from the transistor to the architecture levels [3][4]. This paper focuses on approximate adders since addition is one of the most fundamental arithmetic operations. There are many studies on approximate adders which improve power-performance efficiency by disconnecting carry propagation [5]-[9]. The work in [10] proposes an approach to accurately calculate errors of approximate adders, and the work in [11] provides a detailed analysis of the trade-off between accuracy and resource efficiency. The authors of [12][13] state that circuit design with variable computational accuracy is desirable for designing systems that meet diverse requirements. An approximate adder circuit, named carry-maskable adder (CMA), that can change the computational accuracy is proposed in [14]. Since CMA enables dynamic control of accuracy, it is possible to perform approximate operations within an error tolerable range. Based on CMA in [14], in this paper, we propose an accuracy-controllable approximate adder for FPGAs. If the original CMA is implemented in FPGAs in a straightforward manner, lookup tables (LUTs) are connected in series, and hence, the delay and power increase significantly. Our proposed adder, named carry-chain based carry-maskable adder (CC-CMA), takes advantage of fast carry-chain modules which are built in FPGAs. This paper is organized as follows. Section 2 presents CC-CMA, and Section 3 analyzes the hardware cost, delay, computational error and power consumption of CC-CMA. Finally, Section 4 summarizes this paper and discusses future work. This work was done while the author was with Fukuoka University, Japan. The 4th International Symposium on Advanced Technologies and Applications in the Internet of Things (ATAIT 2022), August 24-26, 2022, Ibaraki, Japan 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 2. Carry-Chain based Carry-Maskable Adder In this section, we first explain the carry-maskable adder [14], and then, we propose a carry-chain based carry-maskable adder for FPGAs. 2.1. Carry-Maskable Adder This work is based on an approximate adder, named carry-maskable adder (CMA), which was originally proposed in [14]. CMA can dynamically change the accuracy level according to the special input signal. Figure 1 (a) shows the diagram of an 8-bit CMA. The 8-bit CMA consists of a carry- maskable half adder (CMHA) and seven carry-maskable full adders (CMFAs) connected in series, as shown in Figure 1 (b). In addition to three inputs x, y and carry-in denoted as cin, CMFA has a special input named a mask to dynamically control the accuracy. If the mask is 0, CMFA performs exact addition. If the mask is 1, the carry-out signal Cout is 0, and the sum s is the logical sum of a and b assuming that the carry-in signal from the lower bit is 0. This computation is not accurate but approximate. However, since carry signals are not propagated from lower bits to upper bits, the delay and power consumption are reduced. By setting masks of lower bits to 1 and those of upper bits to 0, the upper (more significant) bits are computed accurately, and the lower (less significant) bits are computed approximately. Thus, by controlling mask, we can explore the trade-off between accuracy, delay, and power consumption, depending on the requirement of the applications. (a) 8-bit CMA (b) Carry-maskable full adder Figure 1. Carry-maskable adder [14] 2.2. Carry-Chain Based CMA Originally, CMA was designed for ASICs, and how to implement CMA on FPGAs is not presented or discussed in [14] despite the widespread use of FPGAs. If the Boolean expressions of CMA are given to FPGA synthesis tools, each CMFA is mapped to one or two lookup tables (LUTs), and the LUTs are connected in series, as shown in Figure 1 (a). This implementation is not efficient in terms of delay and power consumption since LUTs are slow and power-consuming. Recent FPGAs are equipped with carry-chain modules for fast addition. Figure 2 shows a schematic diagram of an accurate 4-bit adder using a built-in carry chain module. The carry-chain module consists of four multiplexers and four EXOR gates, and is connected from four LUTs. For accurate addition, each LUT is configured to compute EXOR of x and y as follows. As seen in Figure 2, carry signals go through the carry-chain module. In other words, the carry signals do not go through LUTs which are slow and power-consuming. Thanks to the built-in carry- chain module, addition is computed fast at low power. In this work, we take advantage of the carry-chain modules in the design of approximate adders. Figure 3 shows our proposed approximate adder, named carry-chain based carry-maskable adder (CC- CMA. LUTs labeled P0-P3 and G0-G3 compute Equations (2) and (3), respectively. Figure 2. 4-bit accurate adder using carry chain Figure 3. 4-bit CC-CMA (2) (3) Similar to the accurate adder shown in Figure 2, carry signals in CC-CMA do not go through LUTs but go through fast carry-chain modules when mask is 0. When mask is 1, carry signals do not propagate to upper bits, which achieves faster and lower-power computation than the accurate adder at the expense of computational inaccuracy. 3. Evaluation We have designed 32-bit and 64-bit CC-CMAs in Verilog-HDL, and synthesized them for Xilinx Artix-7 device with Xilinx Vivado 2019.2. For comparison, we have also designed accurate adders and original CMAs [14] in Verilog-HDL. The synthesized accurate adders utilize built-in carry-chain modules, as shown in Figure 2, while the synthesized CMAs do not. The three adders are compared in terms of hardware resources, delay, power consumption, maximum error and average error. Delay, average error and power consumption are obtained by post-synthesis simulation using the Vivado toolkit. For 32-bit and 64-bit adders, 100,000 and 1,000,000 random simulations are performed, respectively. Delay, error and power consumption of CMA and CC-CMA depend on the values of masks. Recall that CMA and CC-CMA compute accurately when masks of all bits are 0. When the masks of the least significant n-bits are set to 1, the lower n-bits are added approximately, and the upper bits are added accurately. In our experiments, we vary the number of lower bits whose masks are set to 1. 3.1. Hardware Resources Table 1 compares hardware resources in terms of the number of 6-input LUTs. The original CMAs use twice as many LUTs as the accurate adders and CC-CMAs. CC-CMAs are as small as the accurate adders, although the functionality of CC-CMAs is more complex. Table 1. Number of LUTs Accurate adder CMA CC-CMA 32-bit adders 32 63 32 64-bit adders 64 127 64 3.2. Delay Figure 4 shows the delay of the adders. The longest delays among 100,000 and 1,000,000 random simulations are shown for 32-bit and 64-bit adders, respectively. For CMA and CC-CMA, the value of the mask is varied. The X-axis in the figure shows the number of lower-bits whose mask is set to 1. For example, in Figure 4 (a), when the mask is 0, CMA and CC-CMA compute accurately. When the mask is 4, the lower 4 bits are computed approximately, and the upper 28 bits are computed accurately. (a) 32-bit adders (b) 64-bit adders Figure 4. Delay (ns) Figure 4 clearly shows that addition is computed faster when more bits are approximated. The figure also demonstrates that CC-CMAs are not efficient because they do not take advantage of built-in carry- chains. 3.3. Power Consumption Figure 5 shows the power consumption of the adders. The original CMAs consume much more power than the accurate adders and CC-CMAs. An interesting observation in the figure is that CC- CMAs consume less power than the accurate adders even in case mask is 0 and CC-CMAs compute accurately. Although the Boolean function of CC-CMA is more complex than that of the original CMA, the internal signals (e.g., the input signals to the multiplexers) of CC-CMA switch less frequently, (a) 32-bit adders (b) 64-bit adders Figure 5. Power consumption (µW) leading to lower power consumption. It is also observed in Figure 5 that the power consumption of CMA and CC-CMA decreases as more bits are masked and approximated. 3.4. Computational Errors So far, we have seen that CC-CMA computes faster at lower power than the accurate adders. These advantages come at the cost of computational error. Table 2 shows the degree of computational errors of CMA and CC-CMA. Recall that the functions of CMA and CC-CMA are exactly the same, and therefore, the amounts of errors of the two adders are the same. random simulation. As more bits are masked, the computational error increases. Table 2. Computational errors of CMA and CC-CMA (a) 32-bit adders mask 4 8 12 16 20 24 28 32 max average (a) 64-bit adders mask 4 8 12 16 20 24 28 32 max average mask 36 40 44 48 52 56 60 64 max average 3.5. Trade-off between Error, Delay and Power From Figure 4 and Table 2, the trade-off between delay and computational error for 32-bit CC-CMA can be derived as shown in Figure 6. The figure shows that the delay can be shortened at the cost of computational error, but the cost is not low. Also, it is not beneficial to dynamically change the delay of adders unless the adders exist on the critical path of the entire circuits. Figure 6. Trade-off between delay and error for 32-bit CC-CMA From Figure 5 and Table 2, the trade-off between power consumption and computational error for 32-bit CC-CMA can be derived as shown in Figure 7. A significant amount of power can be saved at the expense of computational error. Figure 7. Trade-off between power consumption and error for 32-bit CC-CMA 4. Conclusions In this paper, we have proposed an approximate adder named carry-chain based carry maskable adder (CC-CMA) for FPGAs. CC-CMA has a special input signal named a mask to dynamically control the computational accuracy. CC-CMA takes advantage of fast carry-chain modules which are equipped in modern FPGAs. By using the built-in carry chains, CC-CMA computes fast at low power. The experimental results demonstrate the efficiency of CC-CMA compared with an accurate adder and an existing carry-maskable adder. In future, we plan to evaluate CC-CMA using real-world applications. Also, we plan to develop accuracy-controllable multipliers for FPGAs based on CC-CMA. Acknowledgments This work is supported partly by KAKENHI 20H00590, 19H04081 and 21K19776. References [1] H. Esmaeilzadeh, A. Sampson, L. Ceze and D. Neural acceleration for general-purpose approximate programs, IEEE/ACM International Symposium on Microarchitecture, 2012. [2] V. Gupta, D. Mohapatra, S. P. Park, A. Raghunathan and K. Roy IMPACT: IMPrecise adders for low-power approximate computing, IEEE/ACM International Symposium on Low Power Electronics and Design, 2011. [3] S. survey of techniques for approximate c ACM Computing Surveys, 2016. [4] Q. Xu, T. Mytkowicz, and N. computing: A s IEEE Design & Test, 2016. [5] R. Gollu and P. carry maskable adder using modified full s Journal of Physics Conference Series, vol.1921, no.1, article. 012049, 2021. [6] P. K. Sujit, G. Bharat, and R. K. A power and area efficient approximate carry skip adder for error-resilient applications, Turkish Journal of Electrical Engineering and Computer Sciences, vol.28, no.1, pp.443-457, 2019. [7] J. Babita, M. Vishesh, and S. EFCSA: An efficient carry speculative approximate adder with rectification, IEEE 23rd International Symposium on Quality Electronic Design, 2022. [8] L. Jungwon, S. Hyoju, K. Yerin, and K. Approximate adder design with simplified lower-part approximation, IEICE Electronics Express, vol. 17, no. 15, pp. 1-3, 2020. [9] A. Kanani, J. Mehta and N. ACA-CSU: A carry selection based accuracy configurable approximate adder design, IEEE Computer Society Annual Symposium on VLSI, 2020. [10] M. Rezaalipour, M. Rezaalipour, M. Dehyadegari and M. N. AxMAP: Making approximate adders aware of input patterns, IEEE Transactions on Computers, vol. 69, no. 6, pp. 868-882, 2020. [11] D. Catelan, R. Santos and L. Duenha, Accuracy and physical characterization of approximate arithmetic circuits, XXI Simpósio em Sistemas Computacionais de Alto Desempenho, 2020. [12] S. Venkataramani, V. K. Chippa, S. T. Chakradhar, K. Roy, and A. programmable vector processors for approximate computing, International Symposium on Microarchitecture, 2013. [13] A. B. Kahng and S. -configurable adder for approximate arithmetic designs, Design Automation Conference, 2012. [14] T. Yang, T. Ukezono and T. An accuracy-configurable adder for low-power applications IEICE Trans. on Electronics, vol. E103-C, no. 3, pp. 68-76, 2020.