In the recent years, error recovery circuits in optimized data path units are adopted with approximate computing methodology. In this paper the novel multipliers have effective utilization in the newly proposed two different 4:2 approximate compressors that generate Error free Sum (ES) and Error free Carry (EC). Proposed ES and Proposed EC in 4:2 compressors are used for performing Partial Product (PP) compression. The structural arrangement utilizes Dadda structure based PP. Due to the regularity of PP arrangement Dadda multiplier is chosen for compressor implementation that favors easy standard cell ASIC design. In this, the proposed compression idealogy are more effective in the smallest
The process of multiplication and addition are the essential functions carried out by the Arithmetic Unit (AU). Basic elements of the processing applications like signal, Image and multimedia process are formed by using Arithmetic Units (AU). The operating frequency of the processing relies on AUs critical delay and depends on the setback of multiplier. The operating frequency for processing element is determined AUs critical delay which relies on the delay of the multiplier. Array multiplication is the standard algorithm used to multiply 2 input operands that use parallel approach for PP generation. The PP bits are shifted based on weight and compressed using Carry Propagate Adder (CPA).
Scaling at deep submicron technology increases the density of integration in VLSI chips. Hence the power density increases proportionately, and the heat dissipation issues arise. Low supply voltage operation to solve the power density issues, on the other hand, increases delay. A number of approaches to the design of multipliers for optimum performance are proposed in the literature. Low error and area-efficient truncated multipliers are proposed for fixed-width applications [
Approximate computing is a recent methodology for the logic design of high speed, reduced power and area-efficient architectures for approximate applications. Approaches to the design of area and power-efficient approximate adders are proposed in [
In this approach, two novel approximate 4:2 compressors were proposed that can optimize area, power, delay, and generate either sum or carry with no error. The proposed compressors are targeted for PP compression in multipliers with structure-based PP arrangement that employ Dadda multipliers. In the targeted multipliers, for
In this paper, Section 2 presents a typical 4:2 exact compressor with structural modifications. Section 3 describes about the design and its functionality in the sense of speed in all aspects of multiplication. The performance of the above said novelties are evaluated in Section 4. Section 5 details the newly up to date characteristics multiplier in various digital technology deployments. As a final note, a comprehensive assumption of the behind the proposed work is presented in Section 6.
4:2 compressor is one of the perfect choice to perform PP compression in multipliers for the arithmetic units with fixed width data path of 2 (i.e., 2N where N = 3, 4, 5, 6) as its multiplying number likewise as exactly similar to Arithmetic unit of the processors is the major advantage in all kinds of digital technology deployment as Partial Product compression concerned with firm-width data path. The implementation of the compressor is through Dadda type multiplier because of regularity in the PP arrangement that favors easy standard cell ASIC design. In the sub Section 2.1, proposed 4:2compressor and design of approximate compressors are explained.
By using exact compressor, the n 1-bit inputs are added, where n is equal to the functionality
The standard Boolean expression the compressor is modified for Partial Product compression of multipliers as logic low in C_{i} and output C_{o} is ignored. The proposed 4:2 new compressor has same inputs as stated above and Sum-S and Carry-C as the outputs. The ignorance of C_{o} in the new compressor causes error when A_{1}A_{2}A_{3}A_{4} = “1111” has probability.06 and Maximal Error Deviance that is (Max-ED) = −2.
Contrarily, the algorithm of approximation which generates
The delay of computation for AND, OR, XOR, NOT, XNOR, and NAND gates are denoted as
However, in the multiplier implementation of projected circuit explained in Section 3, we considered C_{o} as ignored in the projected circuit has been compensated to eliminate error by adding bit E = A_{1}&A_{2}&A_{3} & A_{4}.
This section briefs the design of efficient approximate 4:2 compressors that can generate either sum or carry with no error. In proposed design 1, the logic generates no error in sum and three errors in carry, and it will be referred hereafter as “Proposed compressor with Exact Sum” (Proposed-ES). In proposed design 2, the logic generates no error in carry and three errors in sum, and it will be referred hereafter as a Proposed compressor with Exact Carry(Proposed-EC).As the target for our approximate compressors is to design high-speed multipliers with reduced error, the proposed compressors are used in PP compression logic based on the significance of compressor sum and carry signals on the multiplier output.
In the proposed-ES design shown in
In reference to the Boolean
In the Proposed-EC design shown in
It is well known from the Boolean
The Proposed-EC compressors logic depth is t_{Prop-EC} = 2 * t_{XOR} + 1 * t_{OR}
This section briefs the design of
where ‘
In the Area Efficient (AE) design designated as Proposed-AE(P-AE), carry bits are not generated in LS imprecise part of the final stage, and MS bits adding of the exact part is performed using
Note that we use the Proposed-EC compressor in the approximate part in stage-1 since carry signal has a greater influence in stage-2 and on the multiplier output. In stage-2, we use the Proposed-ES compressor for columns with binary weight 3–6, as the influence of the sum signal is higher in these columns on the multiplier output. However, for the column with binary weight 7, we use the Proposed-EC compressor as the carry signal in this column has higher significance on the final result. Note from
In the Proposed Area Efficient with Error Recovery multiplier (P-AEER), carry bits are generated in the most significant two PP columns of the approximate part in the pre-final stage. An error recovery (E_{R}) signal is generated using AND logic on these carry bits and is added with the least significant carry signal in the accurate part. The logic of E_{R} is given by
Symbols + and & represents arithmetic OR and logical AND operations, respectively.
1 | 2 | 3 | 4 | |
---|---|---|---|---|
00 | 01 | 10 | 11 | |
11 | 11 | 11 | 11 | |
2^{6} | 2^{6} | 2^{6} | 2^{6} | |
2^{7} | 2^{7} | 2^{6} | 2^{6} |
The novel multipliers, inexact compressors and its design are explained in the review of literature section and are designed using structural Verilog HDL codes. The multipliers are synthesized using Cadence Encounter in 90nm technology. To optimize supply voltage for simulations, we made a performance estimate of the proposed compressor design in terms of power and delay, and found that at supply voltage-1 V, the PDP of proposed compressors is low, and hence performance comparison of multipliers with new variant compressors and the novel multiplier variants is made using simulations with supply voltage-1 V.
Performance metrics in terms of power, area, delay, and PDP of proposed compressors and state-of-the-art approximate designs used for comparison are shown in
Compressors | Power |
Area |
Delay |
PDP (× 10^{−16} Joules) | Error | |
---|---|---|---|---|---|---|
Carry | Sum | |||||
Exact design | 1597 | 35 | 708 | 11.31 | – | – |
XOR-XNOR | 2957 | 54 | 607 | 17.95 | 3 | 0 |
TA4-2C | 519 | 18 | 222 | 1.152 | 2 | 3 |
DQ4:2C3 | 836 | 20 | 220 | 1.84 | 5 | 5 |
4-2CAM | 549 | 17 | 192 | 1.05 | 2 | 3 |
MADM | 1293 | 31 | 499 | 6.45 | 1 | 1 |
I4-2C | 1366 | 32 | 501 | 6.84 | 4 | 4 |
Proposed-ES | 1169 | 26 | 470 | 5.49 | 3 | 0 |
Proposed-EC | 919 | 27 | 501 | 4.60 | 0 | 3 |
Error metrics are the important parameters to evaluate the efficacy of an approximate design in error-tolerant applications. In this section, the performance of the proposed approximate multipliers and state-of-the-art approximate designs is evaluated in terms of various error metrics Modified architecture of Dadda Multiplier, 4-2 compressor-based approximate multiplier, DQ4:2C3 using standard output as the recent works. The accuracy metrics considered are Mean Error Distance (MED), Mean Relative Error Distance (MRED), Normalized Error Distance(NED), and Percentage Accuracy.
4-2CAM | DQ4:2C1 | DQ4:2C2 | DQ4:2C3 | DQ4:2C4 | MADM | I4-2C | Proposed | P-AE | P-AEER | 4-2CAM |
---|---|---|---|---|---|---|---|---|---|---|
348 | 899 | 427.5 | 951.8 | 401.5 | 320.3 | 118.6 | 121.4 | 129.7 | 128.1 | 348 |
0.13 | 0.22 | 0.13 | 0.22 | 0.10 | 0.07 | 0.019 | 0.02 | 0.03 | 0.03 | 0.13 |
0.14 | 0.20 | 0.10 | 0.22 | 0.13 | 0.09 | 0.02 | 0.03 | 0.02 | 0.04 | 0.14 |
Performance of the proposed and state-of-the-art multipliers in terms of total power dissipation (power), area, delay and PDP are shown in
PDP of our basic, AE and AEER multipliers are 3.7%, 14.8%, 14.6%; 14.2%, 24.1%, 23.9%; 16.8%, 26.3%, 26.1%, and 46.7%, 52.9%, 52.7% improved compared to multipliers DQ4:2C3, DQ4:2C4, Modified architecture of Dadda Multiplier, and Imprecise 4-2 compressor, respectively. Nevertheless, designs in DQ4:2C1 and DQ4:2C2 fair better PDP compared to proposed designs, while the percentage of accuracy decreases. Effectiveness of the proposed and prior multipliers in optimizing Energy (PDP) and error is shown through PDP X Error metric in
Area cost of the proposed designs is low compared to Exact, and design in Imprecise 4-2 compressor. Approximate multipliers in DQ4:2C1, DQ4:2C2, DQ4:2C3, DQ4:2C4 use approximate compressors in all PP columns, while approximate designs in 4-2 compressor-based approximate multiplierandModified architecture of Dadda Multiplier don’t add error compensation bias for X4X3X2X1 = “1111” in the exact part, and hence exhibit low area compared to the proposed designs. However, the average error of designs in DQ4:2C1, DQ4:2C2, DQ4:2C3, DQ4:2C4, 4-2 compressor-based approximate multiplier, and Modified architecture of Dadda Multiplier are significantly high. Additionally, Note from
Multipliers | Power (µw) | Area (× 10^{−6} m^{2}) | Delay (ps) | PDP (× 10^{−13} Joules) | Maximum error | % Accuracy | PDPX error (× 10^{−11}) | |
---|---|---|---|---|---|---|---|---|
Static | Total | |||||||
Exact | 11.3 | 80.496 | 1415 | 5882 | 4.735 | – | 100 | – |
4-2CAM | 7.23 | 49.788 | 1091 | 3689 | 1.837 | 512 (2^{n+1}) | 87.24 | 23.4 |
DQ4:2C1 | 5.67 | 40.085 | 861 | 3666 | 1.47 | 65,536 (2^{2n}) | 77.92 | 32.45 |
DQ4:2C2 | 6.14 | 44.280 | 942 | 3666 | 1.623 | 65,536 (2^{2n}) | 86.64 | 21.68 |
DQ4:2C3 | 7.09 | 52.892 | 1119 | 3755 | 1.986 | 65,536 (2^{2n}) | 77.76 | 44.17 |
DQ4:2C4 | 8.27 | 57.336 | 1200 | 3887 | 2.229 | 65,536 (2^{2n}) | 89.68 | 23 |
MADM | 7.76 | 60.472 | 1221 | 3798 | 2.297 | 512 (2^{n+1}) | 93.48 | 14.97 |
I4-2C | 10.8 | 76.8 | 1402 | 4673 | 3.589 | 512 (2^{n+1}) | 97.72 | 8.18 |
Proposed-basic | 7.96 | 55.636 | 1364 | 3437 | 1.912 | 512 (2^{n+1}) | 97.62 | 4.55 |
P–AE | 7.44 | 52.206 | 1248 | 3241 | 1.692 | 512 (2^{n+1}) | 97.34 | 4.5 |
P–AE ER | 7.51 | 52.351 | 1263 | 3241 | 1.697 | 512 (2^{n+1}) | 97.46 | 4.3 |
Parameter | n = 8 | n = 12 | n = 16 | ||||||
---|---|---|---|---|---|---|---|---|---|
Proposed-basic | P-AE | P–AEER | Proposed-basic | P-AE | P–AE ER | Proposed-basic | P-AE | P–AE ER | |
Power (× 10^{−6}w) | 55.64 | 52.21 | 52.35 | 172.8 | 168.9 | 173.87 | 350.3 | 340.4 | 342.3 |
Delay (ns) | 3.44 | 3.24 | 3.24 | 5.18 | 4.88 | 4.95 | 8.808 | 8.246 | 8.246 |
Area (µm^{2}) | 1364 | 1248 | 1263 | 3436 | 3316 | 3356 | 6587 | 6348 | 6381 |
PDP (× 10^{−15} J) | 191.4 | 169.16 | 169.61 | 895.1 | 824.23 | 860.7 | 3086.14 | 2806.93 | 2822.6 |
EDP (X 10^{−24} J) | 658.4 | 548.1 | 549.5 | 4636.6 | 4022.2 | 4260.5 | 27188.9 | 23016.8 | 23145.3 |
An implementation in image enhancement viz., smoothing & scaling, and signal processing applications is done in FPGA board to justify the novelty of modified multipliers in fault-tolerant image processing applications. The Verilog HDL models of the modified new variant multipliers and futuristic approximate designs defined in the literature are synthesized using Xilinx ISE 14.2 tool, and the prototype model for the application system is made by using Spartan 6 FPGA (XC6XLX45-CSG324 device). Input images and signals are send to the FPGA Board using Xilinx-MATLAB co-simulation with System Generator tool.
In the digital images, Image smoothing technique is performed to reduce the blurring effect and noise. It is a pre-processing operation performed on images prior to the main object extraction. The smoothing operation performs averaging on the pixel intensity values of the input image in a pre-defined window and replaces the processing pixel with the result. The weight of the pixel in the window considered for smoothing operation depends on the type of mask used. For example, filter with 3 × 3 mask replace processing pixel with intensity value
where
where
As implementation of 27 tap Finite impulse Response is done to check the functionality of proposed multiplier.
Processed by Proposed-basic and FIR filters (P-AE based) have small deviations when compared with standard output. Output waves processed by Q4:2C1 and 4-2 compressor-based approximate multiplier FIR systems display the highest and moderate deviations, respectively, compared to the standard output.
In the proposed research work, two area-efficient variants of 4:2 compressors (approximate type) targeted in the multiplier using PP compression. The logic of compressors is realized such that the designs generate sum without error in the first variant and carry without error in the second variant. Evaluations revealed that the proposed compressors fair better with regard to gate count and error reductions while comparing with the previous variants discussed in literature. Implementation of the new variant compressors in the Dadda multiplier disclosed the superior performance of the proposed multiplier with regard to processing speed and accuracy when compared to earlier designs. Enhanced variants of the proposed multiplier in terms of area and error recovery demonstrated better efficacy in terms of area at a trade-off in accuracy. Finally, the proposed multipliers are implemented in signal and image processing applications to verify the functionality and driving quality. Visual examination of processed output images and signals concluded that the proposed inexact multipliers perform similar to the standard design with minimal error deviation.
We show gratitude to anonymous referees for their useful ideas.