

# High Performance FIR Filter Architecture for Fixed and Reconfigurable Applications

<sup>[1]</sup> Pallavi Rahunath Yewale, <sup>[2]</sup> Aparna Shinde
<sup>[1]</sup> PG Student, <sup>[2]</sup> Assistant Professor
<sup>[1][2]</sup> Dept. of Electronics and Telecommunication Engg
<sup>[1][2]</sup> D. Y. Patil College of Engineering, Akurdi, Pune, India

*Abstract* - The FIR filter with transposed structure has resister between the adders and can achieve high throughput without adding any extra pipeline resister. Transpose form finite impulse response (FIR) filter is a pipelined structure which supports the multiple constant multiplications (MCM) technique but direct form FIR filter structure does not support MCM technique. The direct form FIR filter needs extra pipeline register between the adder to reduce the delay of an adder tree and to achieve high throughput. The MCM is more effective in Transpose form when the common operand is multiple with the set of constant coefficients that reduce the computational delay.

The implementation of MCM technique is easier in fixed coefficient Transpose form FIR filter but complex in reconfigurable coefficients. In fixed coefficients transpose FIR filter, area and delay are reduced by using MCM technique. The low-complexity design using the MCM technique is implemented for fixed coefficients transpose form FIR filters and multiplier-based design is used for reconfigurable transpose form FIR filter. The implemented transpose form FIR filter structure achieved less area and delay than the direct-form FIR filter structure. The XILINX software tool is used for simulation.

Index Terms— Transpose form FIR filter, multiple constant multiplications (MCM) technique, Block processing

#### **I.INTRODUCTION**

Low power, area efficient, an Finite-impulse response (FIR) digital filter is widely used in several digital signal processing applications, such as speech speaker equalization, processing. loud echo cancellation, adaptive noise cancellation, and various communication applications, including softwaredepend radio (SDR) and so on. Many of these applications require FIR filters of large order to meet the stringent frequency specifications. Very often these filters need to support high sampling rate for high-speed digital communication. The number of multiplications and additions Required for each filter output, however, increases linearly with the filter order. Since there is no redundant computation available in the FIR filter algorithm, real-time implementation of a large order FIR filter in a resource constrained environment is a challenging task. Filter coefficients very often remain constant and known a priori in signal processing applications. This feature has been utilized to reduce the complexity of realization of multiplications. Several designs have been suggested by various researchers for efficient realization of FIR filters (having fixed coefficients) using distributed arithmetic (DA) and multiple constant multiplication (MCM) methods. DA-based designs use lookup tables (LUTs) to store Pre computed results to reduce the computational complexity.

The MCM method on the other hand reduces the number of additions required for the realization of multiplications by common sub expression sharing, when a given input is multiplied with a set of constants. The MCM scheme is more effective, when a common operand is multiplied with more number of constants. Therefore, the MCM scheme is suitable for the implementation of large order FIR filters with fixed coefficients. But, MCM blocks can be formed only in the transpose form configuration of FIR filters. Block-processing method is popularly used to derive high-throughput hardware structures. It not only provides throughput-scalable design but also improves the area-delay efficiency. The derivation of blockbased FIR structure is straightforward when directform configuration is used [16], whereas the transpose form configuration does not directly support block processing. But, to take the computational advantage of the MCM, FIR filter is required to be realized by transpose form configuration. Apart from that, transpose form structures are inherently pipelined and supposed to offer higher operating frequency to support higher sampling rate.

#### **II. METHDOLOGY**

### 2.1 Existing method

There are several applications where the coefficients of FIR filters remain fixed, while in some other applications, like SDR channelizer that requires



separate FIR filters of different specifications to extract one of the desired narrowband channels from the wideband RF front end. These FIR filters need to be implemented in a RFIR structure to support, multistandard wireless communication. In this section, we present a structure of block FIR filter for such reconfigurable applications. In this section, we discuss the implementation of block FIR filter for fixed filters as well using MCM scheme.

### 2.2 Proposed Structure for Transpose Form Block FIR Filter for Reconfigurable Applications

The proposed structure for block FIR filter is [based on the recurrence relation of (12)] shown in Fig. 6 for the block size L = 4. It consists of one coefficient selection unit (CSU), one register unit (RU), M number of inner product, units (IPUs), and one pipeline adder unit (PAU). The CSU stores coefficients of all the filters to be used for the reconfigurable application.



Fig 1. Proposed structure for block fir filter.



It is implemented using N ROM LUTs, such that filter coefficients of any particular channel filter are obtained in one clock cycle, where N is the filter length. The RU [shown in Fig. 7( a )] receives xk during the kth cycle and produces L rows of S0k in parallel. L rows of S0kare transmitted to M IPUs of the proposed structure. The M IPUs also receive M

short-weight vectors from the CSU the weight vector cMm1 from the CSU and L rows of S0k form the RU



Figure 3 Structure of (m + 1)th IPU.



Fig 5 structure of PAU for block size l = 4.

Each IPU performs matrix-vector product of S0k with the short-weight vector cm, and compute block of L partial filter outputs (r m k ). Therefore, each IPU performs L inner product computations of L rows of S0k with a common weight vector cm. The structure of the (m+1)th IPU is shown in Fig. 7(b). It consists of L number of L-point inner-product cells (IPCs). The (l+1)th IPC receives the (l+1)th row of S0kand the coefficient vector cm, and computes a partial result of inner product r (kL l), for 0 l L 1. Internal structure of (1 + 1) th IPC for L = 4 is shown in Fig. 8(a). All the M IPUs work in parallel and produce M blocks of result (rm k). These partial inner products are added in the PAU [shown in Fig. 8(b)] to obtain a block of L filter outputs. In each cycle, the proposed structure receives a block of L inputs and produces a block of L filter outputs, where the duration of each cycle is T =TM + TA + TFA log2 L, TM is one multiplier delay,



neens

TA is one adder delay, and TFA is one full-adder delay.

## 2. 3 Filter MCM-Based Implementation of Fixed-Coefficient FIR

We examine the induction of MCM units for transpose shape piece FIR channel, and the plan of proposed structure for settled channels. For altered coefficient usage, the CSU of Fig. 5 is did not require anymore, since the structure is to be custom-made for stand out given channel. Correspondingly, IPUs are not required. The augmentations are required to be mapped to the MCM units for a low-unpredictability acknowledgment. In the accompanying, we demonstrate that the proposed plan for MCM-based execution of piece FIR channel makes utilization of the symmetry in information network S0k to perform even and vertical regular sub expression disposal and to minimize the quantity of move include operations in the MCM squares. As appeared in Table I. MCM can be connected in both even and vertical course of the coefficient network. The example x (4k3) shows up in four lines or four sections of the accompanying information grid.

Though x (4k) shows up in one and only line or one section. In this way, all the four lines of

| R= | [ x(4k)   | x(4k-1)x(4k-2) | x(4k - 3)] | [ h(0) | h(4)h(8)  | h(12)] |
|----|-----------|----------------|------------|--------|-----------|--------|
|    | x(4k - 1) | x(4k-2)x(4k-1) | x(4k-4)    | h(1)   | h(5)h(9)  | h(13)  |
|    | x(4k-2)   | x(4k-3)x(4k-4) | x(4k-5)    | h(2)   | h(6)h(10) | h(14)  |
|    | x(4k-3)   | x(4k-4)x(4k-5) | x(4k - 6)  | h(3)   | h(7)h(11) | h(15)  |

coefficient grid are included in the MCM for the x (4k 3), while just the main line of coefficients are included in the MCM for x (4k). For bigger estimations of N or the littler piece sizes, the line size of the coefficient lattice is bigger that outcomes in bigger MCM measure over every one of the examples, which comes about into bigger sparing in computational intricacy.

### TABLE-I

| Input sample | Coefficient Group                                     |  |  |
|--------------|-------------------------------------------------------|--|--|
| $\tau(4k)$   | $\{h(0), h(4), h(8), h(12)\}$                         |  |  |
| -01 -0       | $\{\hat{n}(0), \hat{n}(4), \hat{n}(8), \hat{n}(12)\}$ |  |  |
| $\pi(m-1)$   | $\{h(1),h(5),h(9),h(13)\}$                            |  |  |
| x(4k-2)      | $\{h(0),h(4),h(8),h(12)\}$                            |  |  |
|              | $\{h(1),h(5),h(9),h(13)\}$                            |  |  |
|              | $\{h(2), h(6), h(10), h(14)\}$                        |  |  |
|              | $\{h(0), h(4), h(8), h(12)\}$                         |  |  |
| die o        | $\{h(1),h(5),h(9),h(13)\}$                            |  |  |
| x(4n-3)      | $\{h(2), h(6), h(10), h(14)\}$                        |  |  |
|              | $\{h(3),h(7),h(11),h(15)\}$                           |  |  |
| x(4k-4)      | $\{h(1), h(5), h(9), h(13)\}$                         |  |  |
|              | $\{h(2), h(6), h(10), h(14)\}$                        |  |  |
|              | $\{h(3), h(7), h(11), h(15)\}$                        |  |  |
| x(4k-5)      | $\{h(2),h(6),h(10),h(14)\}$                           |  |  |
|              | $\{h(3),h(7),h(11),h(15)\}$                           |  |  |
| x(4k 6)      | $\{h(3), h(7), h(11), h(15)\}$                        |  |  |

## MCM in transpose form block fir filter of length = 16 and block size = 4



Fig 6 proposed mcm-based structure

### III. PROPOSED METHOD

3.1 MCM based implementation of coefficient FIR filter using CSA





### Fig 7. Proposed mcm structure with CSA

In proposed methodology instead of ripple carry adder carry skip adder is used which increase a speed of the design and also reduce area and delay. Result of this method is compared with conventional method in table II which shows that as compared to conventional adder carry skip adder has high performance and high speed. Simulation result and waveform of proposed implementation are shown in result section it has less delay over conventional adder.



Fig 8 schematic of carry skip adder

The main idea in a carry-bypass adder is that the carry generation in a block is based on the make and propagate signals. Suppose signals Ai and Bi are the inputs to an adder and the values of A and B are such that all the propagate signals are high; then the carryout would be equal to the carry- in. Hence, when all the propagate signals are equivalent to one, the carry coming in is sent directly to the next block quite than passing through all the individual adder cells. When all the propagate signals are not like to single, the carry propagates through all the cells. This is illustrated clearly in the 4- bit carry-skip construction in Figure 8 When P0P1P2P3 are all equal to one, then the carry Cin looks at the output through the bypass rather than propagating through all the blocks. This mostly is used to rise the speed of process of the adder.

### VI. EXPRIMENTAL RESULT

Proposed structure for reconfigurable applications gives better result than conventional direct form FIR filter here result of block size 4 and length is 16 is shown in fig with output waveform. By using this technique delay is reduced.



Fig 9 simulation result of reconfigurable filter

Simulation result of MCM based implementation of fixed coefficient FIR filter is shown in fig with block size is 8 and length 16



Fig 10 simulation result of fixed fir filter



Simulation result of MCM based implementation of fixed coefficient FIR filter using Carry Skip Adder is shown in fig with block size is 8 and length 16 which has less delay, power and high speed over design using conventional carry adder.



fig11simulation result of proposed fixed fir filter

Table ii: comparison of simulationresult offir filter

|                                                                                          | Delay(ns) | Power(mw) | No.<br>of LUTs         |
|------------------------------------------------------------------------------------------|-----------|-----------|------------------------|
| Transpose<br>form block<br>FIR filter for<br>reconfigurable<br>application<br>L=4 & N=16 | 10.190    | 85        | 128 out<br>of 9312     |
| MCM based<br>fixed<br>coefficient<br>FIR filter<br>L=4 & N=16                            | 5.281     | 37        | 490 out<br>of<br>27288 |
| Proposed                                                                                 |           |           |                        |

| MCM based   | 3.726 | 84 | 2097 out |
|-------------|-------|----|----------|
| fixed       |       |    | of       |
| coefficient |       |    | 92152    |
| FIR filter  |       |    |          |
| using CSA   |       |    |          |
| L=4 & N=16  |       |    |          |

Table II shows the performance of FIR filter for fixed and reconfigurable applications and compare it with previous direct form FIR filter, comparison shows that proposed method has better performance than existing.

### **VII. CONCLUSION**

The design of FIR filter for fixed and reconfigurable application using MCM Scheme reduces area as well as delay as compared to conventional direct form FIR filter. The proposed structure involves significantly less are delay product. The simulation was carried out using Xilinx 14.2 software. The simulation results shows the delay (4.909nsec) reduction as compared to the conventional direct form FIR filter. For the modified MCA scheme using CSA structure number of LUT also get reduces. A carry skip adder reduces the delay and increase the overall speed of design over a conventional adder. Due to proposed design ADP and ESP get reduces.

### REFERENCES

[1] B. K. Mohanty and P. K. Meher, "A High-Performance FIR Filter Architecture for Fixed and Reconfigurable Applications" IEEE Tran on VLSI,Feb 2015

[2] S. Y. Park and P. K. Meher, "Efficient FPGA and ASIC realizations of a DA-based reconfigurable FIR digital filter," IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 61, no. 7, pp. 511–515, Jul. 2014

[3] B. K. Mohanty and P. K. Meher, "A highperformance energy-efficient architecture for FIR adaptive filter based on new distributed arithmetic formulation of block LMS algorithm," IEEE Trans. Signal Process., vol. 61, no. 4, pp. 921–932, Feb. 2013.

[4] R. Mahesh and A. P. Vinod, "New reconfigurable architectures for implementing FIR filters with low complexity," IEEE Trans. Comput.-



Aided Design Integr. Circuits Syst., vol. 29, no. 2, pp. 275–288, Feb. 2010.

[5] A. P. Vinod and E. M. Lai, "Low power and high-speed Implementation of FIR filters for Software defined radio receivers," IEEE Trans. Wireless Commun., vol. 7, no. 5, pp. 1669–1675, Jul. 2006.

[6] J. Park, W. Jeong, H. Mahmoodi-Meimand, Y. Wang, H. Choo, and K. Roy, "Computation Sharing programmable FIR filter for low-power and high-performance applications," IEEE J. Solid State Circuits, vol. 39, no. 2, pp. 348–357, Feb. 2004.

[7] K.-H. Chen and T.-D. Chiueh, A low-power digit-based reconfigurable FIR filter, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 53, no. 8, pp. 617621, Aug. 2006

[8] A. P. Vinod and E. M. Lai, Low power and high-speed implementation of FIR filters for software defined radio receivers, IEEE Trans. Wireless Commun., vol. 7, no. 5, pp. 16691675, Jul. 2006.