# A Time-Domain Layered Finite Element Reduction Recovery Method (LAFE-RR) for High-Frequency VLSI Design

Houle Gan\* and Dan Jiao School of Electrical and Computer Engineering, Purdue University 465 Northwestern Avenue, West Lafayette, IN 47906

# Introduction

IC design has been guided by circuit theory for more than three decades. As onchip designers travel deeper and deeper into the submicron regime, electromagnetic-based analysis has increasingly become essential for highperformance IC design. The reasons are four-fold: (1) Reduced feature sizes. At the 45 nm processing technology node and beyond, the IC industry will have to print features that are several times less than the wavelength of light (193nm) being used. In this regime, light has to be modeled as a wave. (2) Increased clock frequency. Currently the clock frequency of microprocessors is in the gigahertz regime. Since it is necessary to analyze the chip response to harmonics 5 times the clock frequency, it is expected that interconnects at high frequencies would have to be analyzed with certain electromagnetic effects incorporated. The importance of electromagnetic (EM) analysis at tens of GHz was quantitatively demonstrated, via simulation and real silicon measurements, in [1] and [2]. (3) The transition from single core to multicore. Active power management of multicore processors requires large-scale EM analysis of the global power supply network in order to accurately model current variations and transient power drops and ground bounce. (4) Increased level of integration. Integrating RF, analog, and digital circuitry on a same chip leads often to undesirable coupling and sometimes to system failure. For high-frequency mixed-signal design, an electromagnetic solution is required to overcome the fundamental limits of circuit-based solutions.

Since the advent of computational electromagnetics (CEM) in the 1960s, it has evolved into its prime to date. The advancement of CEM has been widely applied to microwave engineering, antenna analysis, scattering analysis, wireless communications, and optoelectronics. It has also been applied to board and packaging problems. However, the existing CEM technology has been found not amenable for very large-scale on-chip design [3]. This is mainly because VLSI design demands very large-scale electromagnetic solutions, which cannot be offered by many current CEM techniques. In addition, the unique modeling challenges of on-chip problems [3] further complicate electromagnetic analysis.

In this paper, we develop a time-domain layered finite element reduction recovery (TD-LAFE-RR) method to solve large-scale IC design problems at high frequencies. This method rigorously reduces the matrix of the original multilayer system to that of a single-layer no matter how large the original problem is. *More* 

This work was supported by a grant from Office of Naval Research under award N00014-06-1-0716

*importantly*, the matrix reduction is achieved *analytically*, and hence the CPU and memory overheads are minimal. Compared to the layered finite-element method we developed in [4], the proposed method further improves the modeling capacity and performance since the matrix reduction is achieved analytically. In addition, developed in time domain, the method permits nonlinear modeling and broadband simulation within one run. Numerical results are given to demonstrate the accuracy and efficiency of the proposed method.

## Formulation

Integrated circuits are generally multilayer structures. A Manhattan-type integrated circuit is even layered in any of x-, y-, and z-direction. Inside these circuits, the electric field **E** satisfies the second-order vector wave equation

$$\nabla \times [\mu_r^{-1} \nabla \times \mathbf{E}(\mathbf{r}, t)] + \mu_0 \mathcal{E} \partial_t^2 \mathbf{E}(\mathbf{r}, t) + \mu_0 \sigma \partial_t \mathbf{E}(\mathbf{r}, t) = -\mu_0 \partial_t \mathbf{J}(\mathbf{r}, t) \text{ in } V \quad (1)$$

subject to certain boundary conditions. A time-domain finite-element solution of (1) and its boundary conditions results in the following system of ordinary differential equations:

$$\mathbf{T}\frac{d^2u}{dt^2} + \mathbf{R}\frac{du}{dt} + \mathbf{S}u + w = j \quad (2)$$

in which T, R, and S are square matrices, and u, w, and j are column vectors. The matrix elements are given by

$$\mathbf{T}_{ij} = \mu_0 \varepsilon \langle \mathbf{N}_i, \mathbf{N}_j \rangle_V; \ \mathbf{R}_{ij} = \mu_0 \sigma \langle \mathbf{N}_i, \mathbf{N}_j \rangle_{V_o}; \ \mathbf{S}_{ij} = \mu_r^{-1} \langle \nabla \times \mathbf{N}_i, \nabla \times \mathbf{N}_j \rangle_V$$
(3)

in which  $N_i$  are the vector basis functions used to expand unknown fields. The dimension of matrix T can be very large for realistic on-chip problems, which constitutes a computational challenge. To solve this problem, we propose a novel efficient and high-capacity solution, TD-LAFE-RR method, as follows.

First, we discretize the computational domain into triangular prism elements. In each prism element, we expand the electric field into prism vector bases. We then order the unknowns layer by layer. By doing so, we generate a banded matrix. Though a banded matrix, its solution remains computationally intensive when the number of unknowns is large. To overcome this problem, we reduce it to a singlelayer matrix instead of solving the entire system matrix as a whole. Assuming the layer of interest is the first layer, we perform a bottom-up procedure to eliminate other layers and project their contributions to the first layer. To do so, we first form the finite-element sub-matrix in the N-th layer. We then eliminate it and project its contribution to the (N-1)-th layer. Next, we work on the modified (N-1)-th layer. 1)-th layer, and project its contribution to the (N-2)-th layer. We continue this procedure until we reach the first layer. An illustration of the layer-by-layer elimination procedure is shown in Fig. 1. From left to right, Fig. 1 lists the submatrix in the N-th layer, the modified sub-matrix in the (N-1)-th layer, ..., and the modified sub-matrix in the first layer. Matrix  $I_i$  is the mass matrix formed by either upper- or lower-plane prism vector bases in the *i*-th layer; matrix  $\mathbf{K}_i$  is the mass matrix formed by upper- and lower- plane vector bases in the *i*-th layer; matrix \* is the mass matrix formed by prism vector bases between the upper and lower planes. Some matrix elements vanish in Fig. 1 because the vector bases in the upper/lower plane are perpendicular to those in-between. Mathematically, the

| I <sub>N</sub> | 0 | K <sub>N</sub> |        | I <sub>N-1</sub> | 0 | K <sub>N-1</sub>  | I <sub>1</sub>        | 0 | <b>K</b> <sub>1</sub> |
|----------------|---|----------------|--------|------------------|---|-------------------|-----------------------|---|-----------------------|
| 0              | * | 0              | $\Box$ | 0                | * | 0                 | 0                     | * | 0                     |
| K              | 0 | I <sub>N</sub> |        | K <sub>N-1</sub> | 0 | I' <sub>N-1</sub> | <b>K</b> <sub>1</sub> | 0 | I'1                   |

Fig. 1. Numerical procedure of the layer-by-layer elimination. layer-by-layer elimination procedure can be represented as (4).

$$\mathbf{I}_{N-1}^{'} = \mathbf{I}_{N-1} + I_{N} - \mathbf{K}_{N} \mathbf{I}_{N}^{-1} \mathbf{K}_{N} \qquad b_{3,N-1}^{'} = b_{3,N-1} - \mathbf{K}_{N} \mathbf{I}_{N}^{-1} b_{3,N}$$
  

$$\mathbf{I}_{N-2}^{'} = \mathbf{I}_{N-2} + I_{N-1} - \mathbf{K}_{N-1} \mathbf{I}_{N-1}^{'-1} \mathbf{K}_{N-1} \qquad (4) \qquad b_{3,N-2}^{'} = b_{3,N-2} - \mathbf{K}_{N-1} \mathbf{I}_{N-1}^{'-1} b_{3,N-1} \qquad (5)$$
  
...  

$$\mathbf{I}_{1}^{'} = \mathbf{I}_{1} + I_{2} - \mathbf{K}_{2} \mathbf{I}_{2}^{'-1} \mathbf{K}_{2} \qquad b_{3,1}^{'} = b_{3,1} - \mathbf{K}_{2} \mathbf{I}_{2}^{'-1} b_{3,2}^{'} \qquad (5)$$

The right hand sides of each layer are also updated during the elimination process, which can be recursively written as (5). After obtaining  $\mathbf{I}_{1}$ , the original system matrix equation is rigorously reduced to a matrix equation that only involves the unknowns in the first layer, which can be solved readily. With the unknowns in the first layer known, we can recover the unknowns in other layers recursively as:

$$x_{3,i} = \mathbf{I}_{i}^{-1} [b_{3,i} - \mathbf{K}_{i} x_{1,i}], i = 1, 2, ..., N$$
(6)

Clearly, the computation in (4), (5), and (6) only involves single-layer unknowns at each step. *More importantly*, the reduction in (4) and (5) is done analytically. This is because matrices I and K in the same layer are actually linearly proportional to each other. In addition, I and K in different layers are also linearly proportional to each other since the matrices I and K are linearly proportional to layer thickness and permittivity. Therefore the final reduced system matrix is obtained instantly without the need of any computation. The only computational cost is the inversion of matrix I' in each layer for the update of the right hand side as shown in (5) as well as for obtaining the unknown vectors as shown in (6). The inversion only needs to be done once since I' in different layers are linearly proportionally to each other. In addition, matrix I' is an extremely sparse matrix; and also is orders-of-magnitude smaller than the original matrix. Hence, the computation of the entire TD-LAFE-RR scheme is inexpensive.

## **Numerical Results**

First, we validate the proposed method with a structure having analytical solution: a parallel-plate waveguide structure. According to typical geometrical dimensions of on-chip problems, its width (along y) was set as 1mu; its height (along x) was set as 0.1mu. Its length (3.5mu) was subdivided into 35 layers. The dominant TEM mode was launched on the incident plane at z=0. The exact absorbing boundary condition for the dominant mode was placed on both the incident and

exiting planes. The incident pulse was the time derivative of a Gaussian pulse,  $\hat{E}^{inc}(t) = \hat{x}2te^{-\left(\frac{t}{\tau}\right)^2}$ , in which  $\tau$  is chosen to be 3.0e-13. The time step was chosen to be 1.0e-16s. In total 15,000 time steps were run. The electric field at z=1.1mu was sampled and compared with the analytical solution, which reveals an excellent agreement as shown in Fig. 2(a). Fig. 2(b) depicts the electric fields sampled at z=0, 0.1mu, 0.6mu, 1.1mu, and 2.9mu respectively. Clearly, the delay and waveform are accurately simulated by the proposed method. Next, we simulated a parallel plate structure of length 1000mu. The bottom and top PEC plates were replaced by conducting planes of conductivity 5.8e+7 to model on-chip intricacy. It took a standard time-domain finite element method 55s to simulate this example, whereas the proposed method only cost 33s although the code was not optimized yet. In addition, the proposed method is memory efficient as it only needs to invert a single-layer sparse matrix. Better efficiency is expected when handling large-scale examples especially those that couldn't be handled by conventional methods. These examples will be reported in the conference.



Fig. 2. Comparison between the proposed solution and the analytical data.

### References

- M. J. Kobrinsky, S. Chakravarty, D. Jiao, et. al., "Experimental validation of crosstalk simulations for on-chip interconnects at high frequencies using Sparameters", IEEE 12th Topical Meeting on Electrical Performance of Electronic Packaging (EPEP), 2003, pp. 329-332.
- [2] D. Jiao, M. Mazumder, et. al., "A novel technique for full-wave modeling of large-scale three-dimensional high-speed on/off-chip interconnect structures," International Conference on Simulation of Semiconductor Proc. and Dev., 2003, pp. 39-42.
- [3] D. Jiao, C. Dai, et. al., "Computational Electromagnetics for High-Frequency IC Design," Intel Corporation, invited paper, IEEE International Symposium on Antennas and Propagation, 2004.
- [4] D. Jiao, "A layered finite-element method for high-capacity electromagnetic analysis of high-frequency ICs," 2006 IEEE International Symposium on Antennas and Propagation.