# Design and FPGA-Implementation of 1<sup>st</sup>-Order 4D IIR Frequency-Hyperplanar Digital Filters

Arjuna Madanayake\* and Randeel Wimalagunaratne Electrical and Computer Engineering University of Akron Akron, OH, 44325-3904 Email: arjuna@uakron.edu Donald G. Dansereau

Australian Centre for Field Robotics
School of Aerospace, Mechanical and
Mechatronic Engineering
University of Sydney, NSW, Australia
Email: d.dansereau@acfr.usyd.edu.au

Len T. Bruton
\*Electrical and Computer Engineering
University of Calgary
Calgary, AB, Canada

Abstract—Four-dimensional (4D) 1st-order frequency-hyperplanar digital filters have emerging applications in computed tomography (CT), volumetric ultrasound, and light field processing for computer vision. A novel IIR architecture for real-time implementation of such filters is proposed, and two single-chip prototypes are implemented on Xilinx Virtex-4 Sx35ff668-10 FPGAs. Experimental results from the FPGA physical implementation as well as simulated 4D frequency responses of the two 4D IIR filter realizations are used to quantify the proposed circuits in terms of frequency response, speed and FPGA resource utilization.

### I. Introduction

The increasing complexity and capability of programmable logic devices is opening a wide range of difficult high-dimensional problems to real-time hardware implementation. Examples include five-dimensional (5D) motion estimation and four-dimensional (4D) segmentation in computed tomography (CT) and volumetric ultrasound imagery [1]–[3], 4D seismic monitoring [4], and 4D computer vision applications such as depth filtering and distractor removal [5]–[8].

Within these applications, a large number of problems can be addressed by linear filters which can be elegantly implemented in digital hardware. This work is concerned with the formulation of digital hardware architectures appropriate to these higher-dimensional problems.

Although the range of potential applications is broad, for the purpose of evaluating our technique we focus on the problem of depth filtering from light field camera arrays. In particular, we propose a novel architecture for hardware implementation of the 4D 1<sup>st</sup>-order frequency-hyperplanar digital filter proposed in [9]. We implement the filter on a single Xilinx Virtex-4 *Sx35-10ff668* FPGA device, demonstrating that what would normally be a processor-intensive and possibly slow operation can be implemented on specialized hardware, allowing accelerated or even real-time operation [10].

### II. BACKGROUND: FILTERING OVER 4D LIGHT FIELDS

Light fields first came about as an image-based approach to computer graphics [11], [12], foregoing geometric models for a collection of images describing the behaviour of light permeating a scene. They have since gathered attention in image processing, allowing linear filtering techniques to accomplish



Fig. 1. The two-plane parameterization of light rays.

complex tasks such as depth filtering and distractor isolation from moving cameras [8], [9].

The light permeating a static scene is most completely described in terms of the continuous plenoptic function  $\mathcal{L}(\mathbf{x},\hat{\mathbf{r}})$ , which varies with position  $\mathbf{x}=(X,Y,Z)$  and direction  $\hat{\mathbf{r}}=(r_x,r_y,r_z),\|\hat{\mathbf{r}}\|=1$  [13] – note that direction has only two degrees of freedom, and so the total dimensionality of the plenoptic function is five. All cameras sample subsets of this function, and a pinhole camera in particular measures a pencil of rays passing through a fixed position  $\mathbf{x}$ .

A light field camera measures a larger subset of the plenoptic function by placing multiple apertures on a 2D grid, yielding a 4D subset of the plenoptic function in two angular and two spatial dimensions. It may seem that constraining the apertures to a plane represents a loss of information in the third spatial dimension. However, when operating in a non-attenuating medium such as air, and in the absence of occlusion, rays do not change in value along their direction of propagation, and so the third spatial dimension can be discarded [11]. We operate under the assumption that occlusion accounts for relatively little of a scene's energy.

A convenient way to parametrize the 4D light field is using the two-plane parametrization (2PP) depicted in Fig. 1, in which each of the measured rays is described by its points of intersection with two reference planes: the  $(s,t) \in \mathbb{R}^2$  plane given by z=0, and the  $(u,v) \in \mathbb{R}^2$  plane which is parallel to the (s,t) plane at some positive separation z=D. One can imagine the (s,t) coordinate as fixing the position of a

ray, with (u,v) fixing its *direction*. Alternatively, the (s,t) plane can be imagined as a grid of pinhole cameras facing the (u,v) plane. Fixing (s,t) selects a specific pinhole camera, and (u,v) act as pixel coordinates for that camera, skewed such that all cameras share common (u,v) coordinates.

We denote the continuous-domain 4D light field L(s,t,u,v), which, following uniform sampling at  $(\Delta s, \Delta t, \Delta u, \Delta v) \in \mathbb{R}^4$ , leads to a discrete-domain 4D input tensor  $\mathbf{x}(n_1\Delta s, n_2\Delta t, n_3\Delta u, n_4\Delta v) \overset{\mathbf{4D}}{\Leftrightarrow} X(\mathbf{z}), \mathbf{z} \equiv (z_1, z_2, z_3, z_4) \in \mathbb{C}^4$ . The output of a 4D IIR filter is another light field  $\mathbf{y}(n_1\Delta s, n_2\Delta t, n_3\Delta u, n_4\Delta v) \overset{\mathbf{4D}}{\Leftrightarrow} Y(\mathbf{z})$ .

In [9], a 1<sup>st</sup>-order frequency-hyperplanar filter was proposed using the resonant properties of resistively terminated passive 4D inductance-resistance networks [14]. Ideal frequency-hyperplanar filters are planar resonant on the 4D passband hyperplane [9]

$$\omega_1 L_1 + \omega_2 L_2 + \omega_3 L_3 + \omega_4 L_4 = 0 \tag{1}$$

and have the 4D Laplace transform input-output transfer function

$$H(s_1, s_2, s_3, s_4) = \frac{R}{R + \sum_{k=1}^{4} L_k s_k} \equiv \frac{Y(s_1, s_2, s_3, s_4)}{X(s_1, s_2, s_3, s_4)}$$
(2)

where  $s_k = \sigma_k + j\omega_k$ ,  $(\sigma_k, \omega_k) \in \mathbb{R}^2$ , k = 1, 2, 3, 4, for passive synthesis parameters  $L_k \geq 0$ , k = 1, 2, 3, 4, and R > 0. This hyperplanar filter has the desirable property of selectively filtering a light field for a prescribed *depth*. That is, one may selectively extract scene elements which lie within a plane parallel to and at a prescribed distance from the light field camera, attenuating all scene elements which are outside of the selected plane. It is this filter (2) which motivates our hardware implementation.

# III. SCANNED-ARRAY 4D IIR FILTER ARCHITECTURE

We propose a 4D IIR filter architecture starting from a one-dimensional (1D) direct-form (type II) signal flow graph (SFG). Direct-form II SFGs results in canonical structures that consume the lowest memory resources, leading to the smallest number of first-in-first-out (FIFO) buffers in the digital filter circuit. This important advantage is retained in the proposed 4D filter architecture. The authors believe the proposed novel 4D filter architecture to be the first and only such digital hardware architecture in the current literature.

A. 1<sup>st</sup>-order 4D IIR frequency-hyperplanar filters from network resonance

The 4D IIR frequency-hyperplanar z-domain transfer function corresponding to (2) is given by

$$H(\mathbf{z}) = \frac{\prod_{k=1}^{4} (1 + z_k^{-1})}{1 + \sum_{i=0}^{1} \sum_{h=0}^{1} \sum_{k=0}^{1} \sum_{l=0}^{1} b_{ihkl} z_1^{-i} z_2^{-h} z_3^{-k} z_4^{-l}} = \frac{Y(\mathbf{z})}{X(\mathbf{z})},$$

$$(3)$$





Fig. 2. Block definitions for the raster-scanned 4D frequency-hyperplanar filter architecture (top). Raster-scanned first-order 4D IIR frequency-hyperplanar digital filter architecture in direct-form (type II) (bottom) [10].

where

$$b_{ihkl} \equiv \frac{\left(R + (-1)^i L_1 + (-1)^h L_2 + (-1)^k L_3 + (-1)^l L_4\right)}{\left(R + \sum_{K=1}^4 L_K\right)}.$$
(4)

The 4D input-output tensors  $x(\mathbf{n})$  and  $y(\mathbf{n})$  are related to the corresponding 4D **z**-transforms via

$$x(\mathbf{n}) \stackrel{4D}{\Leftrightarrow} X(\mathbf{z}),$$

$$y(\mathbf{n}) \stackrel{4D}{\Leftrightarrow} Y(\mathbf{z}),$$
(5)

where  $\mathbf{n} = (n_1, n_2, n_3, n_4) \in \mathbb{Z}^4$ . Here,  $b_{ihkl}$  are 4D filter feedback coefficients. The first-order practical-BIBO stable



Fig. 3. Spatial delay processor (SDP\_D) circuit for ZICs in the depth-wise dimension  $n_3$  [10].

4D IIR frequency-hyperplanar digital filter is implemented using the 4D recursive difference equation under zero initial conditions (ZICs),

$$y(\mathbf{n}) = \sum_{i=0}^{1} \sum_{h=0}^{1} \sum_{k=0}^{1} \sum_{l=0}^{1} x(n_{1} - i, n_{2} - h, n_{3} - k, n_{4} - l) - \sum_{i=0}^{1} \sum_{h=0}^{1} \sum_{k=0}^{1} \sum_{l=0}^{1} \sum_{l=0}^{1} b_{ihkl} y(n_{1} - i, n_{2} - h, n_{3} - k, n_{4} - l).$$

$$(6)$$

# B. Raster-scanned 4D volume arrays or light field tensors

In array processing, the required raster-scanned input stream is obtained by uniformly raster-scanning a 3D  $N_1 \times N_2 \times N_3$  volume sensor-array, first along columns, then rows, and finally depth. Alternatively, for light fields, the input tensor  $x(\mathbf{n})$  is of size  $N_1 \times N_2 \times N_3 \times N_4$ . The sampled stream of data, denoted as  $w_{SCAN}(k) = x(\mathbf{n})$ , is used for 4D filtering operations, which produces a corresponding filtered output stream  $y_{SCAN}(\mathbf{n})$ . The raster-scanned input stream and filtered output stream are obtained under the raster-scanned sampling relation  $k = N_1 N_2 N_3 n_4 + N_1 N_2 n_3 + N_1 n_2 + n_1$ . The circuit clock frequency is  $F_{CLK} = 1/\Delta T_{AS}$  where  $\Delta T_{AS} = \Delta T_S/N_1 N_2 N_3$  and  $\Delta T_S$  is the uniform volume-frame sample-time (i.e. temporal inter-sample time for each sensor).

## C. 4D direct-form (type II) structure

The proposed architecture [10] (see Fig. 2) is a novel 4D direct-form (type II) structure. ZICs, being essential for practical-BIBO stability, are obtained using spatial-delay-processor (SDP) blocks. The architecture requires 15 parallel multipliers, 19 two-input adders including the 16-input adder tree shown in Fig. 2, one volume-array clocked FIFO buffer  $\Gamma$  of length  $N_1N_2N_3$  (which requires the most memory resources), three  $SDP_Ds$ , five  $SDP_Rs$ , and nine  $SDP_C$  ZIC circuits.

Column- and row-wise ZICs are achieved using column and row SDP circuits, denoted  $SDP_C$  and  $SDP_R$  respectively

[15]. A novel depth-wise spatial delay processor [10]  $SDP_D$ , that is a 4D extension of SDPs in [15], is proposed in Fig. 3. SDPs present a zero value at their output when a ZIC is required, and pass the input signal unchanged otherwise.

# D. VLSI resource consumption and critical path delay

If the VLSI hardware requirements for  $W_{mul}$ -bit multipliers and adders are  $\gamma_M$ , and  $\gamma_{A/S}$ , respectively, then the total VLSI resource consumption of a circuit is approximated by [10]  $\gamma_T \approx 15\gamma_M + 19\gamma_{A/S} + W_q\Gamma K_o + K_1 + K_2$ , where  $\Gamma = N_1 N_2 N_3$ ,  $K_0$  are the total VLSI resource requirements for a one-bit delay buffer,  $K_1$  are the VLSI resource requirements for the ZIC circuits, and  $K_2$  are additional resource requirements for other delay buffer circuits and quantizers.

The minimum critical path delay (CPD) of the circuit, following pipelining, is given by  $T_{CPD} \approx T_M + T_{A/S} + T_{MUX}$ , where  $T_M$ ,  $T_{A/S}$  and  $T_{MUX}$  are the propagation delays of a parallel multiplier, adder/subtractor, and two-input W-bit multiplexer, respectively. Therefore, the maximum clock frequency is  $F_{CLK,Max} = 1/T_{CPD}$ .

# E. Estimated real-time throughput

The real-time throughput is  $15F_{CLK}$  fixed-point multiplications and  $19F_{CLK}$  additions/subtractions per second, corresponding to the volume-rate  $F_S = F_{CLK}/N_1N_2N_3$  Hz. For a light field having dimensions  $N_1 \times N_2 \times N_3 \times N_4$ , the architecture completes the filtering operation in  $T_o = F_{CLK}/N_1N_2N_3N_4$  seconds.

## IV. PROTOTYPES ON XILINX VIRTEX-4 FPGAS

Two examples of the 4D IIR frequency-hyperplanar filter were physically implemented on a Xilinx Virtex-4 Xc4vSx35-10ff668 FPGA device installed on a Xilinx XtremeDSP kit-4 development board. The two designs are summarized in Table I, in which W and D define precision as input word size and binary point position,  $W_{out}$ ,  $D_{out}$  define output precision,  $W_{mul}$ ,  $D_{mul}$  define coefficient precision, and  $W_q$ ,  $D_q$  define the precision of the quantizer immediately preceding the largest FIFO buffer, having depth  $N_1N_2N_3$ .

Example 1 is a self-contained FPGA implementation fully operational on a single Sx35 device. FPGA resource consumption and CPD both linearly increases with fixed-point precision levels. The CPD must be reduced as much as possible for high speeds of operation, thereby leading to the need for smaller circuits (lower precision). Therefore, the finite word sizes of the hardware design has been scaled to various fixed-point precision levels in order to meet both place-and-route and timing requirements. Example 2 is a similar design but has support for a larger light field. For this reason, its largest buffer – a  $64 \times 64 \times 8$  17-bit FIFO – is located on the host computer, interfaced through a 32-bit PCI interface.

The PCI cores, glue logic and related drivers needed for on-FPGA verification are provided transparently via the hardware co-simulation (HCS) facility of the XtremeDSP Kit-4. The 4D input-output frequency-response of each prototype was measured using HCS by taking the 4D fast Fourier transform

TABLE I DESIGN PARAMETERS AND MEASURED RESULTS FOR EXAMPLES 1 & 2.

|                               | Example 1 | Example 2 |
|-------------------------------|-----------|-----------|
| $\overline{W}$                | 8         | 8         |
| D                             | 0         | 0         |
| $W_{mul}$                     | 17        | 17        |
| $D_{mul}$                     | 16        | 16        |
| $N_1$                         | 32        | 64        |
| $N_2$                         | 32        | 64        |
| $N_3$                         | 10        | 8         |
| $N_4$                         | 16        | 16        |
| $W_{out}$                     | 26        | 24        |
| $D_{out}$                     | 17        | 15        |
| $W_q$                         | 17        | 17        |
| $D_q$                         | 9         | 9         |
| Clock frequency $F_{CLK,Max}$ | 18.21 MHz | 18.27 MHz |
| Number of occupied Slices     | 8,293     | 8,327     |
| Total number of 4 input LUTs  | 15,866    | 16,219    |
| Number of Slice Flip Flops    | 14,619    | 15,130    |
| Number of 4 input LUTs        | 15,550    | 15,945    |
| Number of DSP48s              | 64        | 64        |
| Number of bonded IOBs         | 35        | 67        |
| Number used as BUFGs          | 1         | 1         |
| Number of SLICEMs             | 7,378     | 7,465     |
| $N_1N_2N_3$ FIFO buffer       | On-FPGA   | Ext. RAM  |

(FFT) of the unit impulse response  $h(\mathbf{n}) \stackrel{4D}{\Leftrightarrow} H(\mathbf{z})$ , which was measured and confirmed on-chip using the XtremeDSP Kit-4 for a unit impulse input  $w_{SCAN}(k\Delta T_{AS}) = \delta(k)$ . In Fig. 4 (top), we show a 2D slice of the ideal 4D frequency response (for  $\omega_{3,4}=0$ ) associated with the ideal 4D unit impulse response  $h_{Ideal}(\mathbf{n})$  as defined by (6). Fig.4 (mid) & (bottom) correspond to the hardware-derived 4D unit impulse responses, obtained from bit-true cycle-accurate logic simulation, for Examples 1 and 2, respectively.

# V. CONCLUSIONS

A novel hardware architecture was proposed for the implementation of 4D IIR frequency-hyperplanar digital filters, with applications in 3D video processing, volumetric-array beamforming, and light field processing. The proposed 4D IIR digital filter structure is, to our knowledge, the first of any kind to be proposed for realizing 1st order 4D IIR frequencyhyperplanar transfer functions. Two 4D raster-scanned architectures have been designed, simulated, and implemented on FPGA chips using Xilinx programmable logic technology, and their correct operation has been verified using stepped onchip hardware co-simulation where test vectors were routed to the physical FPGA circuit implementation using Matlabbased HCS hardware-in-the-loop test features.

Reducing the computational complexity using 4D fast algorithms, minimizing CPD, removing quantization effects, estimating power consumption, and trials with real-world data remain for future work.

### REFERENCES

[1] D. Yang, W. Lu, D. Low, J. Deasy, A. Hope, and I. El Naqa, "4D-CT motion estimation using deformable image registration and 5D respiratory motion modeling," Medical physics, vol. 35, p. 4577, 2008.



2D slices  $|H(e^{j\omega_1}, e^{j\omega_2}, 1, 1)|$  of the 4D magnitude frequency response  $|H(e^{j\omega_1}, e^{j\omega_2}, e^{j\omega_3}, e^{j\omega_4})|$  for  $\omega_3 = \omega_4 = 0$ : ideal (top), FPGA Example 1 (mid), and FPGA Example 2 (bottom); the latter two are obtained from 4D FFT of the on-chip measurement  $h(\mathbf{n})$  of the fixed-point designs in TABLE I; all plots are shown over the Nyquist square  $-\pi \le \omega_{1,2} < \pi$ .

- [2] M. Linguraru and R. Summers, "Multi-organ automatic segmentation in 4D contrast-enhanced abdominal CT," in Biomedical Imaging: From Nano to Macro, 2008. ISBI 2008. 5th IEEE Intl. Symp. on. IEEE, 2008, pp. 45-48.
- Q. Duan, E. Angelini, A. Lorsakul, S. Homma, J. Holmes, and A. Laine, "Coronary Occlusion Detection with 4D Optical Flow Based Strain Estimation on 4D Ultrasound," Functional Imaging and Modeling of the Heart, pp. 211-219, 2009.
  [4] R. Calvert, "New 4D seismic monitoring techniques as enablers for
- effective smart fields," in Digital Energy Conf. and Exhibition, 2007.
- [5] D. Dansereau, "4D light field processing and its application to computer vision," Master's thesis, University of Calgary, 2003.
- D. G. Dansereau and L. Bruton, "Gradient-based depth estimation from 4D light fields," in Proceedings of the Intl. Symp. on Circuits and Systems, vol. 3. IEEE, May 2004, pp. 549-552.
- [7] D. G. Dansereau and L. T. Bruton, "A 4-D Dual-Fan Filter Bank for Depth Filtering in Light Fields," *IEEE Trans. on Signal Processing*, vol. 55, no. 2, pp. 542-549, 2007.
- D. G. Dansereau and S. B. Williams, "Seabed modeling and distractor extraction for mobile auvs using light field filtering," in Accepted to Robotics and Automation (ICRA), 2011 IEEE Intl. Conf. on, May 2011.
- [9] D. Dansereau and L. Bruton, "A 4D frequency-planar IIR filter and its application to light field processing," vol. 4, ISCAS 2003, pp. 476-479.
- A. Madanayake, "Real-time FPGA architectures for frequency-planar MDSP," Ph.D. dissertation, University of Calgary, Canada, 2008.
- [11] M. Levoy and P. Hanrahan, "Light field rendering," in Proceedings of the 23rd annual conf. on Computer graphics and interactive techniques. ACM, 1996, pp. 31-42.
- S. Gortler, R. Grzeszczuk, R. Szeliski, and M. Cohen, "The lumigraph," in Proceedings of the 23rd annual conf. on Computer graphics and interactive techniques. ACM, 1996, pp. 43-54.
- [13] E. Adelson and J. Bergen, "The plenoptic function and the elements of early vision," Computational models of visual processing, vol. 1, 1991.
- L. Bruton and N. Bartley, "Three-dimensional image processing using the concept of network resonance," IEEE Trans. on Circuits and Systems, vol. 32, no. 7, pp. 664-672, 1985.
- [15] A. Madanayake and L. Bruton, "A fully-multiplexed first-order frequency-planar module for fan, beam, and cone plane-wave filters," IEEE Trans. Circuits and Systems-II Express Briefs, vol. 53, no. 8, pp. pp. 697-701, August 2006.