# A TIMING MODEL OF SEQUENTIAL CIRCUITS FOR EFFICIENT STANDARD CELL LIBRARY CHARACTERIZATION

## **A DISSERTATION**

Submitted in partial fulfillment of the requirements for the award of the degree

of

# **MASTER OF TECHNOLOGY**

in

# ELECTRONICS AND COMMUNICATION ENGINEERING

(With Specialization in Microelectronics and VLSI Technology)

By YOGENDRA SHARMA



DEPARTMENT OF ELECTRONICS AND COMPUTER ENGINEERING INDIAN INSTITUTE OF TECHNOLOGY ROORKEE ROORKEE -247 667 (INDIA) JUNE, 2011

#### CANDIDATE'S DECLARATION

I hereby declare that the work, which is being reported in the dissertation entitled, "A Timing Model of Sequential Circuits for Efficient Standard Cell Library Characterization", which is submitted in the partial fulfillment of the requirements for the award of degree Master of Technology in Microelectronics & VLSI Technology, submitted in the Department of Electronics and Computer Engineering, Indian Institute of Technology Roorkee (India), is an authentic record of my own work carried out under the guidance and supervision of Dr. Anand Bulusu, Assistant Professor and Dr. Ashok Kumar Saxena, Professor, Department of Electronics & Computer Engineering, Indian Institute of Technology Roorkee.

The matter embodied in the dissertation report to the best of our knowledge has not been submitted for the award of any other degree elsewhere.

Dated : 29/06/2011 Place : Roorkee

(Yogendra Sharma)

#### CERTIFICATE

This is to certify that the above statement made by the candidate is correct to the best of my knowledge.

B. Anand

Dr. Anand Bulusu Assistant Professor

Dr. Ashok Kumar Saxena

Professor

#### ACKNOWLEDGEMENT

It gives me great pleasure to take this opportunity to thank and express my deep sense of gratitude to my guide Dr. Anand Bulusu, Assistant Professor and Dr.A.K. Saxena, Professor, Department of Electronic and Computer Engineering, Indian Institute of Technology Roorkee, for sharing their industry experience and valuable guidance. It is under my professors valuable tutorship that i have learnt different aspects of research study.

I express my sincere thanks to Dr. Sudeb Dasgupta, Assistant Professor, Dr. Sanjeev Manhas, Assistant Professor and Dr. Brijesh Kaushik, Assistant Professor, Department of Electronic and Computer Engineering for their kind help and moral support throughout my dissertation work. Their suggestions helped me analyze my research work thoroughly.

I would like to thank research scholars of Semiconductor Devices and VLSI Technology group and the VLSI Lab staff for their help in learning HSPICE and CADENCE tool and installation of various tools for carrying out my dissertation work.

My special sincere heartfelt gratitude to my parents and sisters, Mr.S.N.Sharma, Rajkumari Sharma, Vandana and Kshama and uncle Mr.H.N.Sharma whose best wishes, support and encouragement have been a constant source of strength to me during my work.

A special thanks to my friends Arun, Shankar, Dhirendra, Rohan, Sameera, Aditya, Ashish, Akhilesh, Baba and Manish. Without mentioning them this work would be incomplete. They have been with me in my good and bad times. I would also like to thank Dr. Anil Roy, HOD, AIET, jaipur, for his constant encouragement, motivating me to take up this area of work and sharing his industry experience. Finally, I would like to thank all my batch-mates, my seniors, friends and Staff at Ravinder Bhawan of IIT Roorkee, whose constant presence made my work enjoyable.

Last but not the least, a sincere thanks to my senior Sandeep Miryala and Research scholar Baljit mam without whom this work would always remain incomplete.

ì

(Yogendra Sharma)

#### ABSTRACT

Accurate estimation of delay in a circuit is a critical task in deep sub-micron technology due to process, voltage and temperature (PVT) variations. Look-Up-Table (LUT) based delay estimation method is the most widely used in Static Timing Analysis (STA). In this method, delay is obtained at some load capacitace  $C_l$  and input transistion time  $t_{rin}$  values using SPICE simulations and is estimated using linear interpolation at other values of  $C_l$  and  $t_{rin}$ . The timing parameters of a latch (setup time, hold time etc.) are expressed similarly in the LUT as a function of input transition, load capacitance and clock skew. One of the major challenges associated with the LUT based method is an appropriate choice of values of  $t_{rin}$ ,  $C_l$ and clock skew which reduce the LUT generation time as well as increase accuracy of delay estimation. Our approach to solve this issue is that of identification of regions of linear variation in delay or latch timing parameters with  $t_{rin}$ ,  $C_l$  and clock skew. We can then reduce the number of values of  $t_{rin}$ ,  $C_l$  and clock skew for which cell charcterization needs to be done. In this work, we identify the region of linear variation of setup time of a CMOS pass-transistor based latch with respect to  $t_{rin}$  and  $C_l$  in which we take appropriate care of its multistage nature and presence of feedback loop in it. We express the model coefficients and model's region of validity as a function of D-latch size. We use this model in reducing the CMOS latch's characterization time significantly while retaining accuracy in setup time estimation. We do not use device current/capacitive model in our work and hence this work is general enough to be valid with scaling. With this work, we were able to save approximately 66 % SPICE simulation during the standard cell library characterization. We have done simulations and validate our model using HSPICE. Delay predicted using our method is in good agreement with those of SPICE simulations with the said saving in simulation time.

# Contents

.

|   | $\operatorname{Can}$  | didate's | s Declaration                            |
|---|-----------------------|----------|------------------------------------------|
|   | Cert                  | ificate  | ii                                       |
|   | Ack                   | nowledg  | gement                                   |
|   | Abs                   | tract.   | vi                                       |
|   | Tabl                  | le of Co | ontents                                  |
|   | $\operatorname{List}$ | of Figu  | iresx                                    |
|   | List                  | of Tab   | les                                      |
|   | List                  | of Sym   | bols                                     |
| 1 | $\mathbf{Intr}$       | oducti   | ion 1                                    |
|   | 1.1                   | Introd   | uction                                   |
|   | 1 <b>.2</b>           | Static   | Timing Analysis                          |
|   |                       | 1.2.1    | Timing Path                              |
|   |                       | 1.2.2    | Net Delay                                |
|   |                       | 1.2.3    | Cell Delay                               |
|   |                       | 1.2.4    | Representation of Combinational Circuits |
|   |                       | 1.2.5    | Representation of Sequential Circuits    |
|   |                       | 1.2.6    | Critical Path                            |
|   | 1.3                   | Delay    | Models                                   |
|   | •                     | 1.3.1    | Shockley's Delay Model                   |
|   |                       | 1.3.2    | Alpha Power Law Delay Model              |
|   |                       | 1.3.3    | Polynomial Delay Models 8                |

|   |                | viii                                                             |
|---|----------------|------------------------------------------------------------------|
|   |                | 1.3.4 Look up Table Method                                       |
| 2 | $\mathbf{Seq}$ | uential Circuits 11                                              |
|   | 2.1            | Timing constraints [17]                                          |
|   | 2.2            | Static Latch using Transmission gates [17]                       |
|   | 2.3            | Sequential circuit characterization                              |
|   | 2.4            | Setup time characterization                                      |
|   |                | 2.4.1 Existing characterization Technique of Setup time [7] 16   |
|   | 2.5            | Problem Statement                                                |
| 3 | Nov            | vel Linear Delay Model and It's Region of Validity 19            |
|   | 3.1            | Overview                                                         |
|   | 3.2            | Simulation setup                                                 |
|   | 3.3            | Novel Linear Delay Model                                         |
|   |                | 3.3.1 Variation of setup time with input data transition time 21 |
|   |                | 3.3.2 Variation of setup time with capacitive load :             |
|   | 3.4            | Region of validity of linear region delay model                  |
| 4 | Imp            | act of Technology Scaling on the Delay Model 37                  |
|   | 4.1            | Overview                                                         |
|   |                | 4.1.1 Delay model Verification at 32nm Technology node 37        |
| 5 | Res            | ults and Conclusion 43                                           |
|   | 5.1            | Overview                                                         |
|   | 5.2            | LUT using Delay Model                                            |
|   | 5.3            | Reduction in Number of Simulations                               |
|   | 5.4            | Accuracy of LUT generated using the model                        |
|   | 5.5            | Conclusion and Future Work                                       |
|   | Bibl           | iography                                                         |
|   | Pub            | lications                                                        |

# List of Figures

| 1.1 | An example illustrating the Combinational circuit with it's timing           |    |
|-----|------------------------------------------------------------------------------|----|
|     | graph.                                                                       | 4  |
| 1.2 | An example illustrating the application of CPM on a circuit with             |    |
|     | inverting gates.                                                             | 5  |
| 2.1 | An example of Sequential circuits.                                           | 12 |
| 2.2 | Representation of two FF coneected through a pure combinatioanl path         | 13 |
| 2.3 | Illustration of the clocking parameters of the flip-flop                     | 14 |
| 2.4 | Static Latch using Transmission Gate [17]                                    | 15 |
| 2.5 | Data-to-Q delay versus setup skew                                            | 17 |
| 3.1 | (a) Setup time variation with input transition time for a given load         |    |
|     | capacitance , (b) Setup time variation with load capacitance for a           |    |
|     | given input transition time. In both figures Points are simulated data       |    |
|     | and dotted lines are fitting of Delay Model                                  | 21 |
| 3.2 | Ckt equivalent when latch is in transparent mode                             | 22 |
| 3.3 | (a) Discharging of I node for $T_R = T_{R1}$ , (b) Discharging of I node for |    |
|     | $T_R = T_{R2}.$                                                              | 24 |
| 3.4 | (a) $T_{DI}$ delay variation with input transition time for a given load     |    |
|     | capacitance , (b) $K_1$ variation with latch width. In both figures Points   |    |
|     | are simulated data and dotted lines are fitting of Delay Model               | 27 |

| 3.5  | (a) $K_2$ variation with latch width , (b) $K_3$ variation with latch width.In       |    |
|------|--------------------------------------------------------------------------------------|----|
|      | both figures Points are simulated data and dotted lines are fitting of               |    |
|      | Delay Model                                                                          | 29 |
| 3.6  | (a) $t_{rbmin}$ variation with load capacitance , (b) $t_{rbmin}$ variation with     |    |
|      | latch width.In both figures Points are simulated data and dotted lines               |    |
|      | are fitting of Delay Model                                                           | 32 |
| 3.7  | (a) $t_{rb}$ variation with load capacitnace with various latch size, (b) $\Delta V$ |    |
|      | variation with latch width                                                           | 33 |
| 3.8  | (a) $V_1$ variation with latch width. (b) $t_{rb}$ variation with latch width.In     |    |
| -    | both figures Points are simulated data and dotted lines are fitting of               |    |
|      | Delay Model.                                                                         | 34 |
| 3.9  | (a) V(I) at $V_{in} = V_{DD}$ , (b) Faliing transition time at intermediate          |    |
|      | node                                                                                 | 35 |
| 3.10 | (a) Dealy between I and Q node $T_{IQ}$ , (b) Delay between D and I                  |    |
|      | node $T_{DI}$                                                                        | 36 |
| 4.1  | (a) Setup time variation with input transition time for a given load                 |    |
|      | capacitance , (b) Setup time variation with load capacitance for a                   |    |
|      | given input transition time.In both figures Points are simulated data                |    |
|      | and dotted lines are fitting of Delay Model                                          | 39 |
| 4.2  | (a) $T_{DI}$ delay variation with input transition time for a given load             |    |
|      | capacitance , (b) $K_1$ variation with latch width. In both figures Points           |    |
|      | are simulated data and dotted lines are fitting of Delay Model                       | 40 |
| 4.3  | (a) $K_2$ variation with latch width , (b) $K_3$ variation with latch width.In       |    |
|      | both figures Points are simulated data and dotted lines are fitting of               |    |
|      | Delay Model                                                                          | 41 |

xi

| 4.4         | (a) $t_{rb}$ variation with load capacitnace with various latch size, (b) $	riangle V$ |    |
|-------------|----------------------------------------------------------------------------------------|----|
|             | variation with latch width (c) $V_1$ variation with latch width. (d) $t_{rb}$          |    |
|             | variation with latch width.In both figures Points are simulated data                   |    |
|             | and dotted lines are fitting of Delay Model                                            | 42 |
| 5.1         | (a) Comparision of setup time using proposed delay model and HSPICE                    |    |
|             | simulation for $W_p/W_n = 1.5/1$ for varying $t_{rin}$ (b)Comparision of               |    |
|             | setup time using proposed delay model and HSPICE simulation for                        |    |
|             | $W_p/W_n=3/2$ for varying $t_{rin}$ (c) Comparison of setup time using                 |    |
|             | proposed delay model and HSPICE simulation for $W_p/W_n$ =4.5/3 for                    |    |
|             | varying $t_{rin}$                                                                      | 47 |
| 5.2         | (a) Comparision of setup time using proposed delay model and $\operatorname{HSPICE}$   |    |
|             | simulation for $W_p/W_n = 1.5/1$ for varying $C_l$ (b) Comparision of                  |    |
|             | setup time using proposed delay model and HSPICE simulation for                        |    |
|             | $W_p/W_n = 4.5/3$ for varying $C_l$ .                                                  | 48 |
| <b>A</b> .1 | Layout of minimum sized D-latch.                                                       | 54 |
| A.2         | Layout of 2*minimum sized D-latch                                                      | 54 |
| A.3         | Layout of 3*minimum sized D-latch.                                                     | 55 |
| A.4         | Layout of 4*minimum sized D-latch.                                                     | 55 |
| A.5         | Layout of 5*minimum sized D-latch.                                                     | 56 |

.

.

.

xii

# List of Tables

| 1 <b>.1</b> | Delay LUT of a logic gate                                           | 10 |
|-------------|---------------------------------------------------------------------|----|
| 5.1         | Number of savings in HSPICE simulation while characterizing D-latch |    |
|             | library using our DM                                                | 45 |

## List of Symbols

| W <sub>n</sub>    | Width of NMOS device                                                                |
|-------------------|-------------------------------------------------------------------------------------|
| $\frac{C_l}{C_l}$ | Load Capacitance                                                                    |
| $C_p$             | Parasitic Capacitance at intermediate(I) node                                       |
| $C_{p,Q}$         | Parasitic Capacitance at out(Q) node                                                |
| $\mu$             | Mobility of electrons                                                               |
| $C_{ox}$          | Gate Oxide Capacitance                                                              |
| I <sub>ds</sub>   | Drain current of MOSFET                                                             |
| $V_{GS}$          | Gate to Source voltage of MOSFET                                                    |
| $V_{DS}$          | Drain to Source voltage of MOSFET                                                   |
| $t_{rin}$         | 0% to 100% transition of input signal                                               |
| $T_R$             | Time taken by input ramp (linear) from 0 to $V_{DD}$                                |
| $T_{setup}$       | Setup time of D-latch                                                               |
| $T_{hold}$        | Hold time of D-latch                                                                |
| L                 | Channel Length of MOSFET                                                            |
| $V_{TH}$          | Threshold voltage of MOSFET                                                         |
| $V_{DD}$          | Power Supply voltage                                                                |
| $t_{rbmin}$       | For a given load, the minimum value of $t_{rin}$ for which our delay model is valid |
| $t_{rb}$          | For a given load, the maximum value of $t_{rin}$ for which our delay model is valid |
| V(I)              | Output voltage at intermediate node I                                               |
| $V_{out}$         | Output voltage                                                                      |
| $V_{in}$          | Input voltage                                                                       |

# Chapter 1

# Introduction

### 1.1 Introduction

The propagation delay is one of the most important parameter in CMOS digital circuits which affect both speed and dynamic power dissipation of a circuit [10]. The speed of an integrated circuit is characterized by it's clock frequency. The setup time, hold time and the period of the clock imposes delay constraints on a combinational path. The setup time determines the maximum delay of combinational path called setup time (long path) constraints. Similarly hold time decide the minimum delay of combinational delay path called the hold time (short path) constraints. Due to process, voltage and temperature (PVT) changes, numerous number of iterations are performed to calculate delay at various nodes of the circuit. With continuous scaling, transistor level \_\_\_\_\_\_lation is becoming more computationally intensive because of the non linear transfer characteristics of CMOS gate [10]. Setup time, hold time and combinational data path delay must be estimated accurately to confirm that the constraints imposed by clock period are satisfied. As the system clock period decreases, pessimism imposed by timing verification tools become less acceptable [12]. More accurate characterization and verification techniques are therefore highly desirable. Therefore, an analytical delay model that does not need much numerical

iterations, has been the subject of much research.

The setup time delay can be measured by two ways [1]:

- 1. Dynamic Timing Analysis(DTA) : Circuit simulations using SPICE can be used to estimate the delay of a circuit accurately. This method is accurate but takes large CPU time to process an entire circuit having large number of transistors. SPICE takes few seconds to process individual transistors in a circuit, so the processing of an entire circuit takes large time.
- 2. Static Timing Analysis(STA) : An alternative method to SPICE which estimate delay reasonably accurate but faster than DTA is *static timing analysis* (STA). STA uses simple delay models to find the delay of entire data path, hence takes lesser time. STA are widely used to verify the behavior of large digital circuits designs in various stages of design. They have become the core engine used inside circuit optimization **t**ools.

### **1.2** Static Timing Analysis

Static timing analysis is a method for determining, if a circuit meets timing constraints without having to simulate. STA is a crucial part of the modern VLSI chip design process because it is fast and maintains a relatively high accuracy compared to well-known dynamic timing analysis like SPICE. It is assured that total delay of combinational path and sequential cell and setup time of the next sequential cell must be smaller than clock period P. In STA, neither proper functionality of circuit is checked nor any vector generation is required.

#### 1.2.1 Timing Path

A timing path is a point-to-point path in a design which can propagate data from one flip-flop to other. Each timing path has one starting point and one end point. Total delay is sum of net and cell delay along the timing path.

#### 1.2.2 Net Delay

Net delay is the total time required to charge or discharge all the parasitics of a given net. Total net parasitics are affected by

- Net length
- Net fan out

#### 1.2.3 Cell Delay

This is the delay provided by cell elements. Cell delay is affected by:

- The input transition time (Slew Rate)
- The total load seen by the output transistors in a circuit.

#### **1.2.4** Representation of Combinational Circuits

A combinational logic circuit may be represented by a timing graph G=(V,E) where elements of V (Vertex Set) are the input and output of logic gates in the circuit [1]. The vertices are connected by the two types of edges.

- One set of edge connects each input of a gate to it's output, which represents the maximum delay paths from input to output.
- Another set of edges connects output of each gate to input of it's fan out gates, corresponds to interconnects delay.

For example, a combinational logic circuit shown in Fig. 1.1 along with timing graph. It has single source and a single sink for simplicity. In event all the primary input are connected to Flip-flops and transition at the same time. Edges are connected from source to primary output to sink.

This type of representation circuit is directed acyclic graph (DAG), acyclic, since combinational circuit generally doesn't has any loops.





#### **1.2.5** Representation of Sequential Circuits

These circuits, consists of combinational elements and sequential elements (Flip-flops or Lathes), may be represented as a set of combinational blocks between latches, and a timing graph may be constructed for each of these blocks [1]. For any such block, sequential elements that fan out to a gate in the block constitute it's primary input. Similarly, a sequential circuit output for which a fan in gate belongs to block together represents it's primary output's. In this, we construct a graph in which each vertex corresponds to combinational elements, an undirected edge is drawn between a combinational elements and combinational elements that it fan out to. Sequential elements are left unrepresented. Computation of delay for combinational block is important. Clock period must be greater than maximum delay of any combinational path plus setup time for sequential elements. Hence, finding the delay of critical path is a vital task. For this, we must find critical path.

#### 1.2.6 Critical Path

To determine if a circuit meets timing constraints, it is necessary to find its critical path [1]. Critical path is the path having maximum delay in traversing from primary inputs (PIs) to primary outputs (POs). CPM(Critical Path Method ) is a technique used to find critical path. Consider the combinational circuit shown in the Fig. 1.2.



In STA the critical path is mostly found by a method which we elucidate here.

Figure 1.2: An example illustrating the application of CPM on a circuit with inverting gates.

Each block in the figure could be a simple logic gate or combinational block, and is characterized by delay from each input pin to output pin. Each block is an inverting type logic gate such as NAND or NOR. The numbers  $d_r/d_f$  inside each block represents the delays for output rise transition and output fall transition cases, respectively. These delay's are obtained through delay models. We also assume that all the primary inputs arrive at the time zero, so that the numbers "0/0" at each primary input represent the worst case rise and fall arrival times, respectively at each of these nodes.

A block is said to be ready for processing when the signal arrival time information is available for all of its inputs. Therefore initially, only those blocks that are fed solely by primary inputs are ready for processing. In the example these correspond to i,j,k and l. Then out of all the blocks that are ready for processing, choose any of the block. We compute the worst case arrival time at output by adding the delay of the block to latest arriving input time. In this way we process the remaining

blocks and through out the entire circuit. In our example the processing of blocks that are chosen are i,j,k,l,m,n,p,o and the worst case delay for the entire block is  $\max(7,11)=11$  units.

To find the critical path in the above example, we begin with the final gate output '0', whose falling transition corresponds to the maximum delay. This transition is caused by the *rising transition at the output of gate* n, which must therefore precede 'o' on the critical path. Similarly, the transition at 'n' is effected by the *falling transition at the output of* 'm', and so on. By continuing this process, the critical path from the input to the output is identified as being caused by a *falling transition at either* c or d, and then progressing as follows: rising  $j \rightarrow$  falling  $m \rightarrow$  rising  $n \rightarrow$ falling o. The arrows in the Fig. 1.2 indicates the critical path.

Thus in STA, critical path is determined by CPM method, which in turn makes use of the delay's of the each logic gate. These delay's of logic gates are obtained by making use of Delay models(DM). STA uses delay models for fast calculation of delay of a data path. Accuracy of STA depends on the accuracy of delay models used. So, these delay models should be as accurate as possible.

### 1.3 Delay Models

In order to find the delay of an entire combinational circuit using STA, we must determine the delay of it's logic gates. When load is fully capacitive, problem of finding delay of that logic gates reduces to that of finding delay of a gate for a specified capacitive load. If gate is made from a library, it's delay is precharacterized, under various load and input transition time  $t_{rin}$ . Some common used delay characterization under capacitive load are:

1. Analytical Delay Models : The delay of a logic gate is found from the output voltage transition of logic gate across the load capacitor. They make use of

the current equation of the MOSFET. The accuracy depends on the accuracy of the current equation. Alpha power law is a typical example.

- 2. Empirical Delay Model : Empirical models are based on curve fitting on the simulation data obtained using SPICE. Polynomial delay model is a typical example.
- 3. Look-up-Table Method : Here we tabulate the gate delays for several values of input transition time  $(t_{rin})$  and load capacitance  $(C_l)$ . Having known  $t_{rin}$  and  $C_l$ , we can pick the delay of that particular logic gate from the table.

We now discuss several important delay models proposed in the literature.

#### **1.3.1** Shockley's Delay Model

This model, calculates the delay using current equation in both triode and pentode region .

In it, drain current is given by:

$$I_D = 0 \quad (V_{GS} \le V_{TH} : cutoff region) \tag{1.1}$$

$$= K[(V_{gs} - V_{TH}) V_{ds} - 0.5 V_{ds}^{2})] \quad (V_{ds} \le V_{dsat} : linear region)$$
(1.2)

$$= 0.5 K (V_{gs} - V_{TH})^2 \quad (V_{ds} \ge V_{dsat} : saturation \ region)$$
(1.3)

where

$$V_{dsat} = (V_{gs} - V_{TH}) : \text{ is the saturation voltage}$$
(1.4)

Due to square relationship in satuaration, it is also called as square law model. Model gives fair result for long channel device. But, for short channel devices, it fails to produce the result. This is because of two reasons

- The drain current in saturation does not follow square law relationship with gate to source voltage
- The drain saturation voltage  $V_{dsat}$  is different from predicted value.

#### 1.3.2 Alpha Power Law Delay Model

This model is extension of shockley's delay model in device saturation region [3]. It takes in to account velocity saturation effect which is predominant in short channel MOSFET'S [3]. Using this model we can derive an equation for logic gate delay by taking into account of the input signal slope.

A full description of the model is given below :

$$I_D = 0 \quad (V_{GS} \le V_{TH} : cutoff region) \tag{1.5}$$

$$= (I'_{D0}/V'_{D0})V_{DS} \quad (V_{DS} < V'_{D0} : linear region)$$
(1.6)

$$= I'_{D0} \quad (V_{DS} \ge V'_{D0}: saturation \ region) \tag{1.7}$$

where

$$I'_{D0} = I_{D0} \left( \frac{V_{gs} - V_{TH}}{V_{D0} - V_{TH}} \right)^{\alpha}$$
(1.8)

$$V_{D0}' = V_{D0} \left( \frac{V_{gs} - V_{TH}}{V_{D0} - V_{TH}} \right)^{\alpha/2}$$
(1.9)

values of  $\alpha$  is 2 for shockley's delay model and for alpha power model, it is 1.2 for n-MOSFET and 1.5 for p-MOSFET. The model is based on four parameters:  $V_{TH}$ (threshold voltage),  $\alpha$  (Velocity saturation index),  $V_{D0}$  (Drain saturation voltage at  $V_{GS} = V_{DD}$ , and  $I_{D0}$  (Drain current at  $V_{GS} = V_{DS} = V_{DD}$ ).

As we observe in alpha power model, the accuracy of analytical delay model depends on the accuracy of the MOSFET current equations. MOSFET current equation for sub-micron technologies are very complex in nature. In addition, their parameters are dependent on the terminal voltage of the device.

#### **1.3.3** Polynomial Delay Models

Traditional methods for characterizing a cell driving a load use an equation of the form  $k_1C_l + k_2$ , where  $k_1$  is the characterized slope and  $k_2$  is the intrinsic delay. However, such an equation neglects the effect of input transition on the delay.

Non Linear Delay Model (NLDM) from Synopsys uses the characterizing equations of the form  $\alpha t_{rin} + \beta C_l + \gamma t_{rin}C_l + \delta$ .

The Scalable Polynomial Delay Model (SPDM) [4] developed by Synopsys uses a product of polynomials to fit the delay data. For example, for two parameters  $C_l$  and  $t_{rin}$ , a product of  $m^{th}$  order polynomial in  $C_l$  with an  $n^{th}$  order polynomial in  $t_{rin}$ , of the form  $(a_0 + a_1C_l + \dots a_mC_l^m)(b_0 + b_1t_{rin} + \dots b_nt_{rin}^n)$  may be used [4].

To overcome these disadvantages, industry is making use of LUT to obtain delay of logic gate. In the next section, we discuss details of look-up table approach of finding the delay of any logic gate.

#### 1.3.4 Look up Table Method

LUT's approach is the most popular method for delay calculation in industry due to it's accuracy and fast approach in STA [5]. The reason for going to this model is its accuracy. As the device size shrinks and the input signal speeds increases it is difficult to predict the device characteristics and hence effects the accuracy of the delay model. LUT approach overcomes the limitation of analytic and polynomial delay models discussed in section 1.3.1, 1.3.2 and 1.3.3. In this method, delay is being tabulated for several values of input signal transition time and load capacitances.

Look up table used for STA is a two dimensional table, where a gate's delay is characterized with respect to its load capacitance  $(C_l)$  and the input signal transition time  $(t_{rin})$  [5].

A general look up table looks as shown in the Table 1.1, tabulates delay with respect to the load capacitance and input signal transition time.

Hence, in look-up table method, delay is precharaterized at some value of  $t_{rin}$ and load capacitance  $C_l$  to increase computational efficiency and minimize storage and characterization requirements. Delay at other value of transition time and load capacitance is calculated through linear interpolation. For this, linear relationship of delay with transition time  $t_{rin}$  and load capacitance  $C_l$  is required.

|             | $C_{load1}$ | $C_{load2}$ | Cload3 | $C_{load4}$ |
|-------------|-------------|-------------|--------|-------------|
| $t_{r-in1}$ | D11         | D12         | D13    | D14         |
| $t_{r-in2}$ | D21         | D22         | D23    | D24         |
| $t_{r-in3}$ | D31         | D32         | D33    | D34         |
| $t_{r-in4}$ | D41         | D42         | D43    | D44         |
| $t_{r-in5}$ | D51         | D52         | D53    | D54         |

Table 1.1: Delay LUT of a logic gate

However, there are several problems associated with the LUT's. We enumerate these problems in the next chapters. Generally look-up table method requires selection of  $t_{rin}$  and load capacitance  $C_l$  in ad-hoc manner. Look-up table has to be generate for each variation in process, voltage, temperature (PVT) and technology node. In next few chapters we try to overcome some problems associated with LUT method.

10

# Chapter 2

# Sequential Circuits

A sequential circuit consists of combinational elements and registers [13]. Generally, they have feed back. So, their timing diagram also has loops. Consider, a given sequential circuit given below in Fig. 2.1. Here, output of a register goes back to input to combinational block [17]. Combinational block has total I+M input's. It also has O+M output's. Register can be made of either edge-triggered Flip-flops or level-triggered latches. For the circuit to work properly with continuously increasing clock frequency, timing constraints has to be satisfied. Now, we will briefly understand all these timing parameters.

## 2.1 Timing constraints [17]

The basic parameters associated with a flip-flop or latch can be summarized as follows:

• Setup time : The data input of the register, commonly referred to as the D input, must receive incoming data at a time that is at least  $T_{setup}$  units before the onset of the latching edge of the clock. The data will then be available at the output node Q, after the latching edge. The quantity  $T_{setup}$  is referred to as the setup time of the flip-flop or latch.



Figure 2.1: An example of Sequential circuits.

- Hold time : The input 'D' must be kept stable for a time of  $T_{hold}$  units, where  $T_{hold}$  is called the hold time, so that the data is allowed to be stored correctly in the flip-flop or latches.
- Clock-to-Q delay : Each latch has a delay between the time the data and clock are both available at the input, and the time when it is latched. This is referred to as the clock-to-Q delay  $T_{CQ}$ .
- Data-to-Q delay : This is the delay between the time the data input arrive at the input and the time when data is latched. This is called as data-to-Q delay  $T_{DQ}$ .
- Maximum clock frequency : It is the maximum clock frequency on which the circuit can operate. Maximum clock frequency is limited by timing constraints of sequential circuits. These constraints are setup time, hold time and clock-to-Q delay.

Consider two flip-flops i and j, connected only by pure combinational paths as shown in Fig. 2.2 [1]. Over all such paths  $i \rightarrow j$ , let

the largest delay from FF i to FF j be  $\overline{d}(i,j)$ , and the smallest delay be  $\underline{d}(i,j)$ . Let us denote the setup time, hold time, and the maximum and minimum clock-to-Q delay of any arbitrary flip-flop be  $T_{Sk}$ ,  $T_{hk}$ ,  $\Delta_k$  and  $\delta_k$  respectively.



Figure 2.2: Representation of two FF coneected through a pure combinatioanl path

Data is available at the launching flip-flop, i after the clock-to-Q delay and will arrive at the latching flip-flop j, at a time no later than  $\Delta_i + \overline{d}(i,j)$ . For correct clocking, the data is required to arrive one setup time before the latching edge of the clock at FF j as shown in Fig 2.3 i.e at a time no later than P- $T_{Sj}$ . Where P is the period of the clock. This will give us a relation as

$$\Delta_i + \overline{d}(i,j) \leq P - T_{Sj} \tag{2.1}$$

$$\overline{d}(i,j) \leq P - T_{Sj} - \Delta_i \tag{2.2}$$

This constraint is often referred to as the setup time constraint. Since this requirement places an upper bound on the delay of a combinational path, it also called as long path constraint. The data must be stable for an interval that is at least as long as the hold time after the clock edge, if it is to be correctly captured by the FF j. Hence it is essential that the new data does not arrive at FF j before time  $T_{hj}$ . Since the earliest time that the incoming data can arrive is  $\delta_i + \underline{d}(i,j)$ , this gives us the



Figure 2.3: Illustration of the clocking parameters of the flip-flop

following hold time constraint:

$$\delta_i + \underline{d}(i,j) \geq T_{hj} \tag{2.3}$$

$$\underline{d}(i,j) \geq T_{hj} - \delta_i \tag{2.4}$$

Since this constraint puts a lower bound on the combinational delay on a path, it is referred to as a short path constraint. If this constraint is violated then the data in the current clock cycle are corrupted by the data from the next clock cycle.

Thus the delay of the combinational data path as well as setup time and hold time constraints have to be measured by the designer to check whether these constraints satisfies the minimum clock period P.

## 2.2 Static Latch using Transmission gates [17]

Latch is a level triggered flip-flop. The most robust and common technique to build a latch involves the use of transmission gate multiplexer. Fig. 2.4 shows an implementation of positive static latch based on multiplexer. For a positive latch, D input is selected when the clock is high and the D input is passed to the output Q. During this, latch is called in transparent mode. When the clock signal is low. output sustain it's value using feedback loop made by two back-to-back inverters. During this, latch is in hold (Opaque) mode. Sizing of transistors therefore is not

critical for realizing correct functionality. So, this latch is also called as static latch. Fig. A.1 shows layout of minimum size static D-latch. For increasing size of latch, we use fingering. The layouts using fingering are shown at the end of the paper.



Figure 2.4: Static Latch using Transmission Gate [17]

Assume that the propagation delay of each inverter is  $t_{pd,inv}$ , and the propagation delay of the transmission gate is  $t_{pd,tx}$ . Also assume that contamination delay is 0, and that inverter, deriving  $\overline{CLK}$  from CLK, has a delay of 0 as well. The setup time is the time before the rising edge of the clock that the input data D must be valid. For the transmission gate multiplexer-based latch, the input D has to propagate through  $I_0$ ,  $T_0$ ,  $I_1$  and  $I_2$  before the rising edge of the clock. This ensures that the node voltages on both terminals of the transmission gate  $T_1$  are at the same value. Otherwise, it is possible for the cross coupled pair  $I_1$  and  $I_1$  to settle to an incorrect value. The setup time is therefore equal to  $3 * t_{pd,inv} + t_{pd,tx}$ 

### 2.3 Sequential circuit characterization

STA tools rely on data described in cell libraries to analyze the circuit [7]. The characterization of individual cells in cell library is therefore highly critical in terms of the accuracy of the STA results. Specifically, the setup time and hold time constraints of the sequential cells are used to verify the timing of a synchronous

circuit. Cycles times have been shrinking dramatically driven by both faster gate delays and more aggressive designs using fewer gates in a design. Thus, setup time accounts for a significant portion of the clock period and therefore modeling of setup time is critical [14]. For this analysis, setup time is calculated from SPICE simulation and store in cell libraries.

### 2.4 Setup time characterization

For STA analysis, setup time has to be characterized. There is one common approach for calculating setup time for sequential elements which is being discussed in next section :

#### 2.4.1 Existing characterization Technique of Setup time [7]

A common approach to characterize setup time is to examine the setup skew versus data-to-Q delay relationship at a fixed hold skew. Three regions can be determined from Fig. 2.5 : stable(No failure), metastable, failure (mal function) regions. In the stable region data-to-Q delay is independent of setup skew. As the skew decreases, the data-to-Q delay starts to rise in exponential fashion. If the skew is excessively small, the latch fails to latch the data. This region is unstable region. Region between stable and unstable region is called as metastable region.

The setup(hold) is usually set to the setup skew where stable region crossover into metastable region. There are different approach to identify the crossover point. In some approaches, the crossover points is the time where a certain amount of degradation in the data-to-Q delay occurs. For example, 5% degradation is taken as reference in industry. In some approaches, crossover points is the time where the sum of setup skew and data-to-Q delay is minimized. At this point slope of the curve is 45 degree.



Figure 2.5: Data-to-Q delay versus setup skew

So the approach described above uses SPICE simulation to calculate setup time which is stored in standard cell library. But, it takes lot of time to characterize setup time in library characterization. To overcome these drawbacks of SPICE simulation, we use LUT approach to characterize setup time. Since for fast calculation and also for simplicity, linear delay model for setup time is very much desired in LUT approach. In the next chapter we derive and validate linear delay model for setup time.

### 2.5 Problem Statement

- Presently in LUT method,  $t_{rin}$  and  $C_l$  values are selected in an ad-hoc manner, so there needs to a systematic way of choosing these values for improving accuracy.
- These characterization tables have to be regenerated due to process, voltage

and temperature (PVT). The LUT are characterized for several sizes of logic gates. This requires huge characterization effort.

- The accuracy of delay obtained by LUT approach depends on the size of the LUT. Increasing the size increases accuracy but increases the system's memory storage and simulation time.
- No linear delay model has been proposed for sequential cells due to presence of multi stage cells and feedback loops.

In this work, we propose a solution to the above problems with existing LUT's which focuses on optimum choice of  $t_{rin}$  and  $C_l$  and also determine it's region of validity as a function of it's logic gate size. Using this model and region of validity, we choose appropriate values of  $t_{rin}$  and  $C_l$  for generating LUT's. Our approach can also be used with technology scaling.

# Chapter 3

# Novel Linear Delay Model and It's Region of Validity

### 3.1 Overview

In this chapter we derive a linear delay model for setup time of sequential circuit. The model is physical based and it's coefficient and it's region of validity are obtained using physical arguments with very few SPICE simulation. We make various assumptions while developing the model and later we justify these assumptions. In the next section we derive our linear delay model using physical reasoning.

### 3.2 Simulation setup

In the next section, we derive a linear relation of setup time with data transition time and load capacitance. We do this analysis for positive level D-latch. We use 45nm technology ptm files<sup>2</sup> for HSPICE simulation. We made layouts on CADENCE virtuoso layout editor 6.1.3. We extract our SPICE files from layout. During derivation, we assumed that CLK and  $\overline{CLK}$  are pulse input having no skew and overlap between them. We derived our relationship for rising D input where input D rises from 0 to  $V_{DD}$  and output Q is rising from 0 to  $V_{DD}$ . where  $V_{DD}$  is power

#### Linear Delay Model

supply, which is taken as 1volt. Data transition time has been denoted by  $t_{rin}$  which is time taken by data input for 0 to 100% transition at input of inverter  $I_0$ . In our work, we assume that  $V_{GS} = \frac{V_{DD}}{t_{rin}}t$ , where  $V_{GS}$  is the gate-source voltage of the inverter  $I_0$ 's NMOS device,  $V_{DD}$  is the power supply voltage and t is the time. We use the symbol  $t_{rin}$  to denote the time required for the D input to increase from 0 to  $V_{DD}$ . In this paper, the word "delay" stands for 50% delay unless stated otherwise. First we derived our model for minimum sized latch then we verified it's validity with increasing width of latch. We do fingering to increase the width of latch as shown in appendix. We simulate D-latch with  $W_p$  and  $W_n$  adjusted such that the rise and fall transition times are equal.

# 3.3 Novel Linear Delay Model

In this section, using physical arguments, we show that delay varies linearly with load capacitance  $(C_l)$  and input transition time  $t_{rin}$  when these parameters are within a certain range, since setup time is that skew at which data-to-Q delay  $T_{DQ}$  rises by say 5%. As the delay  $T_{D-Q}$  varies linearly with  $t_{rin}$  and  $C_l$ , setup time also varies linearly with  $t_{rin}$  and load capacitance  $C_l$ . Fig. 3.1(a) shows variation of setup time with load capacitance  $C_l$ . Since setup time varies linearly with data transition time  $t_{rin}$ . Fig. 3.1(b) shows variation of setup time and load capacitance  $C_l$ . Since setup time varies linearly with data transition time and load capacitance, we can model setup time as a linear function of data transition time  $t_{rin}$  and load capacitance  $C_l$  given by equation 3.1

Setup time delay
$$D = K_1 T_R + K_2 C_l + K_3$$
 (3.1)

Where  $K_1, K_2, K_3$  are constants which are extracted by fitting the model in the HSPICE simulation data. In the next section we derive setup time variation with data transition time  $t_{rin}$ .



Figure 3.1: (a) Setup time variation with input transition time for a given load capacitance , (b) Setup time variation with load capacitance for a given input transition time. In both figures Points are simulated data and dotted lines are fitting of Delay Model.

# 3.3.1 Variation of setup time with input data transition time

In this section, we show that setup time delay varies linearly with  $t_{rin}$  for a given value of load capacitance  $C_l$ . Data input D is rising from 0 to  $V_{DD}$  and output Q is rising from 0 to  $V_{DD}$ . We assume linear transition of data input. So  $V_{GS} =$ 

#### Linear Delay Model

 $(V_{DD}/t_{rin}) * t$ , where  $V_{GS}$  is gate to source voltage of NMOS device of Inverter  $I_0$ and  $V_{DD}$  is the power supply voltage and t is the time. When latch is in transparent mode, CLK and  $\overline{CLK}$  are  $V_{DD}$  and Gnd respectively. As shown in figure 3.2, for rising transition of D input, node I discharges from  $V_{DD}$  to Gnd with capacitance  $C_I + C_p$ . Where  $C_I$  is input capacitance of inverter I1 and  $C_p$  is parasitic capacitance at Intermediate(I) node. The output I node discharge comprise of two regions: first, when D input transitions from 0 to  $V_{DD}$  and second, when D input remains at  $V_{DD}$ . So, the complete transition of I node can be divided in two regions:

- The NMOS device of inverter  $I_0$  is in linear region through resistance  $R_{avg}$ ,  $t_{rin}$  in first region.
- The NMOS device of inverter  $I_0$  is in linear region through resistance  $R_{on}$ , t in second region.



Figure 3.2: Ckt equivalent when latch is in transparent mode.

If load capacitance is chosen such that the NMOS device of inverter  $I_0$  is in saturation region from 0 to  $V_{DD}$ . The output I node discharge  $\triangle Q(t_{rin})$  from 0 to  $t_{rin}$  is,

$$\Delta Q(t_{rin}) = \int_0^{t_{rin}} I_{ds} dt \tag{3.2}$$

$$= t_{rin} \int_{0}^{1} f(\frac{V_{GS}}{V_{DD}}, \frac{V_{DS}}{V_{DD}} \cong 1) d(\frac{V_{GS}}{V_{DD}})$$
(3.3)

$$= t_{rin} \int_0^1 f(x, y = 1) dx$$
 (3.4)

$$= S_T t_{rin} \tag{3.5}$$

$$V_{I}(t_{rin}) = V_{DD} - \frac{OutputI \ discharge \ from \ t = 0 \ to \ t = t_{rin}}{C_{l} + C_{p}}$$
(3.6)

$$V_{I}(t_{rin}) = V_{DD} - \frac{S_{T}t_{rin}}{C_{l} + C_{p}}$$
(3.7)

Here,  $I_{ds} = f(\frac{V_{GS}}{V_{DD}}, \frac{V_{DS}}{V_{DD}})$  is the NMOS drive current,  $S_T$  is a constant proportional to  $W_n$  since the NMOS current is proportional to the width of device  $(I_{ds} \alpha W_n)$ ,  $x=V_{GS}/V_{DD}$  and  $y=V_{DS}/V_{DD}$ . The generalized expression of current as a function of  $V_{GS}$  and  $V_{DS}$  enables us to include the second order effects into the expression. We assume that  $y=V_{DS}/V_{DD} \cong 1$  for the NMOS device since it is operating in saturation regime.  $V_I(t_{rin})$  is the output voltage at time  $t=t_{rin}$ . We assume that PMOS device is very weak when compared to NMOS device due to rising transition at the input node.

Assumption 1 : As we show later in paper, for a large number points in the LUT, the NMOS device of  $I_0$  inverter is in linear region in first region where  $V_{in} \leq V_{DD}$ . In region 1, input D rises from 0 to  $V_{DD}$  and node I discharges from  $V_{DD}$  to  $V_1$  for transition time  $T_{R1}$  and  $V_2$  for transition time  $T_{R2}$  respectively as shown in fig. 3.3. So  $V_1$  and  $V_2$  are given by exponential discharge equations:

$$V_1 = V_{DD} \exp \frac{-T_{R1}}{R_{avg}(C_I + C_p)}$$
(3.8)

$$V_2 = V_{DD} \exp \frac{-T_{R2}}{R_{avg}(C_I + C_p)}$$
(3.9)

Here

$$R_{avg} = R_{avg,NMOS} + R_{avg,TG} \tag{3.10}$$

#### Linear Delay Model



Figure 3.3: (a) Discharging of I node for  $T_R = T_{R1}$ , (b) Discharging of I node for  $T_R = T_{R2}$ .

where  $R_{avg,TG}$  = average resistance of Transmission gate and  $R_{avg,NMOS}$  = average resistance of NMOS in linear region

$$R_{avg,NMOS} = \frac{1}{V_{in} - V_{TH}} = \frac{1}{\frac{V_{DD}}{T_R}t - V_{TH}}$$
(3.11)

Assumption 2: If we assume that discharging voltage at node I, V(I) discharges linearly in this region then we can approximate equation 3.8 and 3.9 as

$$V_1 = V_{DD} \left( 1 - \frac{T_{R1}}{R_{avg}(C_I + C_p)} \right)$$
(3.12)

$$V_2 = V_{DD} \left( 1 - \frac{T_{R2}}{R_{avg}(C_I + C_p)} \right)$$
(3.13)

24

Linear Dclay Model

Taking the difference between  $V_2$  and  $V_1$  as  $\Delta$ 

$$\Delta V = V_2 - V_1 = \frac{V_{DD}}{R_{avg}(C_I + C_p)} (T_{R1} - T_{R2})$$
(3.14)

E ACC NO.

Date

25

So the difference of two discharging voltage  $\Delta V$  is linear function of  $T_{R1}$  and  $T_{R2}$ .

In region 2, NMOS device of inverter  $I_0$  is in linear region. For 50 % to 80 % delay between D input and I node, output V(I) discharges from  $V_1$  to  $0.2V_{DD}$  for transition time  $T_{R1}$  and from  $V_2$  to 0.2  $V_{DD}$  for transition time  $T_{R2}$ , with an on resistance  $R_{on}$ .

Where  $R_{on} = R_{on,NMOS} + R_{on,TG}$ using exponential discharge equation:

$$0.2V_{DD} = V_1 \exp \frac{-t_1}{R_{on}(C_I + C_p)}$$
(3.15)

$$0.2V_{DD} = V_2 \exp \frac{-t_2}{R_{on}(C_I + C_p)}$$
(3.16)

By dividing equation 3.16 by equation 3.15, we get:

$$R_{on}(C_I + C_p) \ln \frac{V_2}{V_1} = t_1 - t_2 \tag{3.17}$$

since

$$V_2 = V_1 + \Delta V \tag{3.18}$$

$$R_{on}(C_I + C_p) \ln \frac{V_1 + \Delta V}{V_1} = t_1 - t_2$$
(3.19)

so

$$R_{on}(C_I + C_p) \ln\left(1 + \frac{\Delta V}{V_1}\right) = t_1 - t_2$$
(3.20)

Assumption 3 : For smaller value of  $\Delta T_R$ ,  $\Delta V \simeq 0$ , so we can approximate :

$$\ln\left(1+x\right) \cong x,\tag{3.21}$$

for smaller value of x

$$R_{on}(C_I + C_p)\left(\frac{\Delta V}{V_1}\right) = t_1 - t_2 \tag{3.22}$$

putting value of  $\Delta V$  and  $V_1$  from equation 3.14 and from equation 3.8 respectively, in equation 3.22, we get:

$$R_{on}(C_I + C_p) \left( \frac{T_{R1} - T_{R2}}{R_{avg}(C_I + C_p) - T_{R1}} \right) = t_2 - t_1$$
(3.23)

for transition time  $T_{R2}$ , delay D is given by  $D = \frac{T_R}{2} + t_2$ From equation 3.23, it shows that for given value of  $T_R = T_{R1}$  and load  $C_l$ , setup time delay  $t_2 - t_1$  is a linear function of  $T_{R2}$ .

Further we make the following observations:

- Observation 1 : For a given value of  $T_R = T_{R1}$  and load  $C_l$ , delay between data D and I node I is linearly proportional to  $T_{R2}$  within a region where our assumption 1 and 3 are valid.
- Observation 2 : We also observed that delay coefficients  $K_1$  is given by

$$\frac{R_{on}(C_I + C_p)}{R_{avg}(C_I + C_p) - T_{R1}}$$
(3.24)

which is independent of the width of latch.

We show that equation 3.1 fits well on data with a lower bound  $t_{rbmin}$  and an upper bound  $t_{rbmax}$  on  $t_{rin}$ . Fig. 3.4(a) and fig 3.4(b) verifies the observation 1 and 2 made in this section.

#### **3.3.2** Variation of setup time with capacitive load :

In this section, we show that delay varies linearly with  $C_L$  for a given value of data transition time  $t_{rin}$ . Again we assume  $V_{GS} = \frac{V_{DD}}{T_{R,data}}t$ .

Assumption 4: since at Intermediate node I, load capacitance is almost constant. So for a constant value of  $T_{R,data}$ , delay between D and I node is constant. Falling



Figure 3.4: (a)  $T_{DI}$  delay variation with input transition time for a given load capacitance, (b)  $K_1$  variation with latch width. In both figures Points are simulated data and dotted lines are fitting of Delay Model.

transition at intermediate(I) node is also constant for constant  $T_{R,data}$  and load capacitance at intermediate node. As load capacitance increases, capacitance at intermediate node slightly decreases since capacitance at intermediate node is series combination of  $C_L$  and  $C_{GD}$  of inverter I1. We prove this assumption later in this paper. If assumption 3 is correct then, then delay between I and Q will be linear function of  $C_l$  for constant transition at I node. Node I is falling from Vdd to 0 and

#### Linear Delay Model

output Q is rising from 0 to Vdd.

if the PMOS device of inverter  $I_1$  is in saturation, the output charges from 0 to  $\frac{V_{DD}}{2}$  for 50% delay. The output charge

$$\Delta Q = \int_0^{t_{rout,I}} I_{on} \, dt$$

where  $t_{rout,I}$  is falling transition time at intermediate node. Output charges from 0 to  $\frac{V_{DD}}{2}$  through  $C_l$  and  $C_{p,Q}$  whereas  $C_{p,Q}$  is the parasitics at the output Q node.So,

$$(C_l + C_{p,Q})\frac{V_{DD}}{2} = I_o n * \Delta t$$
(3.25)

$$\Delta t = \frac{(C_l + C_{p,Q})\frac{V_{DD}}{2}}{I_{on}}$$
(3.26)

Here  $I_{on} = f(\frac{V_{GS}}{V_{DD}}, \frac{V_{DS}}{V_{DD}})$  is the PMOS on current. We assume that NMOS device is very weak when compared to PMOS device due to falling transition at the I node.

Since  $I_{on}$  is proportional to  $W_p$  and  $\Delta t$  is proportional to  $\frac{1}{I_{on}}$ . So, delay  $\Delta t$  is proportional to  $\frac{1}{W_p}$ .

If PMOS device of inverter  $I_1$  is in linear region during charging of Q output from 0 to  $\frac{V_{DD}}{2}$  and output node can be assumed as an RC network, then delay is proportional to  $R_{on}(C_l + C_{p,Q})$ . For a given value of load capacitance, since  $R_{on}$  is proportional to  $\frac{1}{W_p}$ , so delay is proportional to  $\frac{1}{W_p}$  in linear region for a given value of load capacitance. So in both linear and saturation region, delay is proportional to  $\frac{1}{W_p}$  for a given value of load capacitance  $C_l$ .

Further we make the following observations:

- Observation 3 :  $K_2$  is linear function of  $\frac{1}{W_2}$
- Observation 4 :  $K_3$  is linear function of  $\frac{1}{W_p}$

We verified our observations 3 and 4 through HSPICE simulation. 3.5(a) and 3.5(b) verifies these observations.

We validate all assumptions that we have made above in 3.4 and use them in optimizing the setup time delay LUT, as we explained in chapter 1. Here, we have



Figure 3.5: (a)  $K_2$  variation with latch width , (b)  $K_3$  variation with latch width.In both figures Points are simulated data and dotted lines are fitting of Delay Model

made the assumptions 3.3.1 that the NMOS device of inverter  $I_0$  is in linear region from 0 to  $t_{rin}$ . This assumption gives the lower limit of region of validity of this model. We also made the assumption 3 using equation 3.21, when  $x = \frac{\Delta V}{V_1}$  becomes comparable to 1, the model diverges from delay equation, this gives the upper limit of the model. In next section, we determine the region of validity of the model given by equation 3.1.

### 3.4 Region of validity of linear region delay model

In this section, we determine the region of validity of our delay model. For lower limit of our model, we find the range of values of  $t_{rin}$  and  $C_l$  in which NMOS devices of inverter  $I_0$  is in linear region. From equation 3.7, we observe that

$$V(I) \mid t = t_{rin} = V_{DD} - \frac{S_T t_{rin}}{C_I + C_p} \le V_{DD} - V_{TH}$$
(3.27)

$$\Delta Q_{t_{rin}} = S_T t_{rin} \ge (C_I + C_p) V_{TH} \tag{3.28}$$

For a given value of  $C_l$ , linear delay model of equation 3.1 is valid for all the values of  $t_{rin}$  which satisfies equation 3.28 .We denote the minimum value of  $t_{rin}$  which satisfy equation 3.1 as  $t_{rbmin}$ . From equation  $t_{rbmin}$  is a linear function of  $C_I$ 

$$t_{rbmin} = \frac{C_I V_{TH}}{S_T} + \frac{C_p V_{TH}}{S_T} \tag{3.29}$$

We extract the slope and intercept of this linear function by fitting in SPICE simulation data. We observe from equation 3.29 that

- Observation 5 :  $t_{rbmin}$  is independent with load capacitance  $C_l$ , since  $C_I$  is independent of  $C_l$ .
- Observation 6: The intercept is a constant with  $W_n$ . This is because

$$S_T \alpha W_n$$
 (3.30)

and

$$C_p \alpha W_n \tag{3.31}$$

and also

$$C_{in} \alpha W_n$$
 (3.32)

#### Linear Delay Model

As  $C_l$  changes,  $C_I$  remains almost constant because of series combination. In fact, it slightly decreases as

$$C_I = C_{constant} + \frac{C_{GD}C_l}{C_{GD} + C_l}$$
(3.33)

$$C_I = C_{constant} + C_{GD} \left( \frac{1}{1 + \frac{C_{GD}}{C_l}} \right)$$
(3.34)

as  $C_{GD}$ ł ł $C_l$ 

using binomial expansion

$$C_{I} = C_{constant} + C_{GD} \left( 1 - \frac{C_{GD}}{C_{l}} \right)$$

so as  $C_l$  increases,  $C_I$  decreases. But the change in  $C_I$  is so small that we can consider it as a constant. So  $t_{rbmin}$  remains almost constant with varying  $C_l$ .

figure 3.6(a) and 3.6(b) confirms the observations 5 and 6 respectively.

Our assumption 3 gets failed for higher values of  $t_{rin}$ , where  $x = \frac{\Delta V}{V_1}$  becomes comparable to 1 in equation 3.21. This gives the maximum value of  $t_{rb}$ , which satisfy our linear delay model equation given by 3.1. We denote this value by  $t_{rb}$ . As  $t_{rb}$ depends on ratio  $\frac{\Delta V}{V_1}$ ,  $t_{rb}$  varies as this ratio changes.

- Observation 7 : Since  $t_{rb}$  depends on discharging values of intermediate node I at  $t_{rin}$ , variation of  $C_l$  does not affect I node discharging value. So  $t_{rb}$  should be independent of  $C_l$ .
- Observation 8 : Since  $\Delta V$  is a voltage difference of two intermediate node voltage at two different  $T_R$ , as Width  $W_n$  increases, both voltage discharges rapidly and voltage difference  $\Delta V$  remains almost constant with varying  $W_n$ . Therefore  $\Delta V$  is constant with  $W_n$ .
- Observation 9 :  $V_1$  discharges faster as  $W_n$  increases since drive current  $I_{on}$  increases. Since

$$V_1 \alpha \frac{1}{I_{on}} i.e. V_1 \alpha \frac{1}{W_n}.$$
(3.35)



Figure 3.6: (a)  $t_{rbmin}$  variation with load capacitance, (b)  $t_{rbmin}$  variation with latch width. In both figures Points are simulated data and dotted lines are fitting of Delay Model

$$t_{rb} \alpha \frac{1}{W_n}.$$
 (3.36)

3.7(a)- 3.7(b) and 3.8(a)- 3.8(b) verifies observations 7-9. So,  $t_{rb}$  depends linearly on  $\frac{1}{W_n}$ . So, in this section we verified the validity of our linear delay model with variation in load capacitance and device width.

In the section 3.3.1 we made assumption 2 that intermediate node I voltage at  $V_{in} = V_{DD}$ , V(I) varies linearly in linear region. We also observe from equation 3.14 that  $V_1$  and  $V_2$  varies linearly with  $t_{rin}$ . figure 3.9(a) confirms the assumption we



Figure 3.7: (a)  $t_{rb}$  variation with load capacitnace with various latch size, (b)  $\Delta V$  variation with latch width

have made. figure 3.9(b) shows falling transition time at I node which is also linear with  $t_{rin}$  of D input.

In the section 3.3.2, we made assumption 4 that delay between I and Q node varies linearly with load capcitance  $C_l$ . Figure 3.10(a) confirms this assumption. Figure 3.10(b) shows delay between D and Intermediate I node remains almost constant with load capcitance  $C_l$ . So, it also confirms that  $T_{DI}$  is almost constant with variation in load capacitance  $C_l$ . So, in this section we confirmed all the assumptions and observations made above in section 3.3.



Figure 3.8: (a)  $V_1$  variation with latch width. (b)  $t_{rb}$  variation with latch width.In both figures Points are simulated data and dotted lines are fitting of Delay Model.

In the next section, we discuss the verification of linear delay model of equation 3.1 and the region of validity expression by equation 3.29 and equation 3.36 at different technology node.

Say, for a D-latch with a size, we know the value of  $K_1$ ,  $K_2$  and  $K_3$  through HSPICE simulation. we also know  $t_{rb}$  for given size.Now, we can predict  $K_1$ ,  $K_2$ ,  $K_3$ ,  $t_{rbmin}$  and  $t_{rb}$  for any given size of D-latch using observations in section 3.3 and 3.4.



Figure 3.9: (a) V(I) at  $V_{in} = V_{DD}$ , (b) Faliing transition time at intermediate node.



Figure 3.10: (a) Dealy between I and Q node  $T_{IQ}$  , (b) Delay between D and I node  $T_{DI}.$ 

## Chapter 4

# Impact of Technology Scaling on the Delay Model

In chapter 3, we discuss our delay model. Now, in this section we will validate similar results on different technology node to find whether our model can migrate to lower technology node with reasonable accuracy.

### 4.1 Overview

With continuous scaling of CMOS device in deep-sub-micron technology, any delay model is useful in STA only when it maintains it's accuracy and validity with technology scaling and PVT variation. In this chapter we show that our delay model for setup time maintains it's validity and accuracy at lower technology node. We show our results using HSPICE simulations. We perform our analysis at 32nm technology node.

### 4.1.1 Delay model Verification at 32nm Technology node

In this section, We verify the validity of our model and our observations regarding it's coefficients and region of validity using HSPICE simulations at 32nm technology

#### Technology Scaling Independence

node<sup>2</sup>. Our linear delay model for setup time is given by:

$$Delay = K_1 T_R + K_2 C_l + K_3 \tag{4.1}$$

We extract the model coefficients  $K_1$ ,  $K_2$  and  $K_3$  using 32nm D-latch HSPICE simulations. In Figure 4.1(a) we plot the setup time variation with the input transition time at a given load capacitance. We show the variation of setup time with load capacitance ( $C_l$ ) for a given input transition time in 4.1(b).

Figure 4.2(a) shows delay between D and I node  $T_{DI}$ , versus input transition time at a given load. Again our delay model perfectly fit with the simulation data with in a lower limit  $t_{rbmin}$  and an upper limit  $t_{rb}$ . In equation 4.1, it was observed that the  $K_1$  is independent of D-latch size whereas  $K_2$  and  $K_3$  varies linearly with  $\frac{1}{W_n}$  respectively.

Figure 4.2(b), 4.3(a), 4.3(b) confirms these observations using HSPICE simulations at 32nm technology node.

Similarly, Figure 4.4(a), 4.4(b) and 4.4(c) shows the variations of  $t_{rb}$ ,  $\Delta V$ , and  $V_1$  respectively with D-latch size  $W_n$ . These plot verifies our observations 7-9, made in section 3.4.

We have verified all the above observations at 32nm technology node and prove that our linear delay model is equally valid at 32nm technology node and maintains it's accuracy.



Figure 4.1: (a) Setup time variation with input transition time for a given load capacitance , (b) Setup time variation with load capacitance for a given input transition time. In both figures Points are simulated data and dotted lines are fitting of Delay Model.



Figure 4.2: (a)  $T_{DI}$  delay variation with input transition time for a given load capacitance, (b)  $K_1$  variation with latch width. In both figures Points are simulated data and dotted lines are fitting of Delay Model.



Figure 4.3: (a)  $K_2$  variation with latch width , (b)  $K_3$  variation with latch width.In both figures Points are simulated data and dotted lines are fitting of Delay Model



Figure 4.4: (a)  $t_{rb}$  variation with load capacitnace with various latch size, (b)  $\Delta V$  variation with latch width (c)  $V_1$  variation with latch width. (d)  $t_{rb}$  variation with latch width. In both figures Points are simulated data and dotted lines are fitting of Delay Model.

## Chapter 5

## **Results and Conclusion**

In chapter 3 we have provided the linear delay model for setup time and validate it. In this chapter, we use the proposed delay model for characterization of standard cell library and observe the saving in number of simulation and resources.

### 5.1 Overview

In previous chapter, We derived and validated simple model for the setup time. We also showed that our linear model can be used to charactrize the setup time for D-latch which relates setup time linearly with input transition time  $(t_{rin})$  and load capacitance  $(C_i)$ , if an upper limit is followed. As the aim of this thesis work is to reduce the characterization effort and time for nanoscale standard cell library with the process, voltage and temprature (PVT) variation, we use our delay model for efficient generation of LUT for library characterization. Sample library consists of important sequential latches of various sizes. In this chapter, we show the saving in number of simulation using our method to generate the LUT's. finally we compare the delay values obtained from traditional LUT's, our LUT's and HSPICE simulation for D-latches of various sizes.

#### Results and Conclusion

### 5.2 LUT using Delay Model

We have explained in chapter 1 the look up table approach in STA for estimation of setup time delay. For example, consider a LUT of  $7 \times 7$  matrix. the number of points in this LUT is 49. These points are generally obtained using HSPICE simulation. Currently, Industry obtained setup time of all the points using HSPICE simulation through the methodology described in 2.4.1 and we call this LUT as traditional LUT. In optimized LUT or LUT using linear delay model, we make use of the simple models derived in chapter 3 for points which are in region of validity as described in section 3.4. However for the points where the delay model is not valid, we use HSPICE simulations. For characterization using our method only requires two HSPICE simulations and then delay model can be used to calculate setup time for all the points in LUT which are within the region of validity. First, we will plot the variation of setup time with transition time  $t_{rin}$ , and slope of this plot will give us  $K_1$  and intercept gives us  $k_2 + K_3$ . Again we plot the variation of setup time with load capacitance  $C_l$  and slope of this will provide  $K_2$  and intercept gives us  $K_1 + K_3$ . Using this two equations, we can calculate values of  $K_1, K_2, K_3$  for that particular width of D-latch. for other sizes of latches, we can calculate  $K_1, K_2, K_3$  values using the observations 1-9 in chapter 3. The upper bound can also be calculate if we know the  $t_{rb}$  for any width using the observations given by in chapter 3. So, we can predict setup time through our delay model for any value of width within  $t_{rb}$ . The points outside the  $t_{rb}$ , requires HSPICE simulation. In the next section we enumerate the saving in the number of points of LUT obtained using simulations.

## 5.3 Reduction in Number of Simulations

In this section, we use proposed linear delay model derived in chapter 3 during LUT characterization for library generation. Hence the number of SPICE simulations required to characterize a standard cell library decreases. In our library we have

#### **Results and Conclusion**

| Size      | 6x6 | 7x7 | 8x8 | 9x9 |
|-----------|-----|-----|-----|-----|
| Reference | 24  | 35  | 48  | 54  |
| 2         | 24  | 28  | 40  | 54  |
| 3         | 24  | 28  | 40  | 54  |
| 4         | 24  | 28  | 40  | 45  |
| 5         | 24  | 28  | 40  | 45  |

built the LUT for D-latch of various sizes. Table 5.1 shows saving in the SPICE simulation for LUT generation For D-latch of various sizes.

Table 5.1: Number of savings in HSPICE simulation while characterizing D-latch library using our DM.

### 5.4 Accuracy of LUT generated using the model

In this section, we calculate setup time delay for D-latch of various size using the model and it's coefficients as derived in chapter 3. Then, we calculate the setup time delay using HSPICE. Then we compare both setup time delay ,obtained using our model and from SPICE simulation. Fig. 5.1 shows the comparison of both the delays.

From 5.1 and 5.2, it clearly depicts that our setup time delay is very close to the HSPICE delay.

## 5.5 Conclusion and Future Work

We show that if an upper bound on input transition time is followed, a simple linear delay model is valid for all the D-latch of various sizes, which relate setup time linearly with  $t_{rin}$  and load capacitance  $C_l$ . We derive the region of validity and delay model coefficients with D- latch size  $W_n$  (For equal rise and fall time transition) and simple relations which express  $t_{rb}$  as a function of  $C_l$  and  $W_n$ . To derive these relations we did not use device current/ capacitance model. We only use the gate topology and charging/discharging of the load stage. Therefore these relations are valid with technology scaling. We extend this work to 32nm technology node.

Using these relations, we show that standard cell library characterization effort can be significantly reduced. We show that standard cell library characterization can be done with a significantly lesser number of simulations (66% reduction) while maintaining accuracy. This is useful since numerous cycle of standard cell library characterization would be needed at several PVT corners in deep-sub-micron technologies due to PVT variation. We can generate the accuracy of the LUT using this linear delay model. We don't need to calculate the setup time delay in region of validity. So, we can calculate setup time using SPICE simulation in the region where the delay is highly non-linear function of  $(t_{rin}, C_l)$  and our model gets failed. So more number of points in LUT increases the accuracy of the LUT.

As s future work, we would extend the relationship for D-latch setup time having clock skew and also for hold time to characterize sequential standard cell in standard cell library characterization.



Figure 5.1: (a) Comparision of setup time using proposed delay model and HSPICE simulation for  $W_p/W_n = 1.5/1$  for varying  $t_{rin}$  (b)Comparision of setup time using proposed delay model and HSPICE simulation for  $W_p/W_n=3/2$  for varying  $t_{rin}$  (c) Comparision of setup time using proposed delay model and HSPICE simulation for



Figure 5.2: (a) Comparison of setup time using proposed delay model and HSPICE simulation for  $W_p/W_n = 1.5/1$  for varying  $C_l$  (b) Comparison of setup time using proposed delay model and HSPICE simulation for  $W_p/W_n = 4.5/3$  for varying  $C_l$ .

## Bibliography

- [1] Louis Scheffer, Luciano Lavagno and Grant Martin, EDA for IC implementation, circuit design and process technology, CRC Press, 2006.
- [2] Sandeep Miryala, Baljit Kaur, Bulusu Anand and Sanjeev Manhas, "Efficient Nanoscale VLSI Standard Cell Library Characterization Using a Novel Delay Model," IEEE JSSC, pp. 584-594, April, 2010.
- [3] T. Sakurai and R. Newton, "Alpha-power law MOSFET model and its implications to CMOS inverter delay and other formulas," IEEE JSSC, pp. 584-594, April, 1990.
- [4] Feng Wang and Shie-Shen Chiang, "Scalable Polynomial Delay Model for Logic and Physical Synthesis," Proceedings of IEEE ICCD, August, 2000.
- [5] http://www.opensourceliberty.com
- [6] Ivan Sutherlands, Bob Sproull and David Harris, Logical Effort: Designing Fast CMOS Circuits, Morgan Kaufmann Publications, 1999.
- [7] Emre Salman, Ali Dasdan, Feroze Taraporevala and Kayhan Kucukcakar, "Exploiting Setup-Hold-time Interdependence in Static Timing Analysis," IEEE TCAD of IC and Systems, June, 2007.
- [8] Emre Salman, Ali Dasdan, Feroze Taraporevala, Kayhan Kucukcakar and Eby G. Friedman, "Pessimism Reduction in Static Timing Analysis Using

#### References

Interdependent Setup and Hold Times," Proceeding of the  $7^{th}$  International Symposium on Quality Electronic Design (ISQED), 2006.

- [9] Jayashree Sridharan and Tom Chen, "Gate Delay Modeling With Multiple Input Switching for Static (Statistical) Timing Analysis," Proceedings of the 19th International Conference on VLSI Design (VLSID), 2006 .
- [10] Yangang Wang and Mark Zwolinski, "Analytical Transient Response and Propagation Delay Model for Nanoscale CMOS Inverter," IEEE, 2009
- [11] SoYoung Kim and S. Simon Wong, "Closed-Form RC and RLC Delay Models Considering Input Rise Time," IEEE transactions on circuit and systemI, Vol. 54, No. 9, 2007
- [12] Hamed Abrishami, Safar Hatami and Massoud Pedram, "Analysis and Optimization of Sequential Circuit Elements to Combat Single-Event Timing Upsets," IEEE Tran. on Computer-Aided Design, Vol. 6, pp. 270-281, March, 1987.
- [13] Hoon Chang and Jacob A. Abraham, "An Efficient Critical Path Tracing Algorithm for Sequential Circuits," Elsevier Science, Microprocessing and Microprogramming- 40,913-916, 1994.
- [14] Amit Jain and David Blaauw, "Slack Borrowing in Flip-Flop Based Sequential Circuits," IEEE GLSVLSI05, April 1719, 2005.
- [15] Jorge Rubinstein, Paul Penfield Jr. and Mark Hortwitz, "Signal Delay in RC Tree Networks," IEEE Transaction on Computer-Aided Design, CAD-2(3), pp. 202-211, July, 1983.
- [16] Jessica Qian, Satyamurthy Pullela, and Lawrence Pillage, "Modeling the effective capacitance for the RC interconnect of CMOS gates," IEEE Tran. Computer-Aided Design of VLSI Circuits and Systems, Vol. 13, pp. 1526-1535, 1994.

References

 [17] Jan M. Rabey, Anantha Chandrakasan and Borivoje Nicolic, Digital Integrated Circuits A design perspective, 2<sup>nd</sup>ed., Prentice Hall of India Pvt Ltd, New Delhi, 2006. Publications

#### PUBLICATIONS

Yogendra Sharma, Baljit Kaur, Anand Bulusu and A. K. Saxena, "A Setup Time Model of Sequential Circuits for Efficient Standard Cell Library Characterization", VLSI Design Conference, 2012 (Submitted)

## Appendix A

## Layout of D-Latch

In this chapter, we include layouts of D-latch for various width. We first make layout for minimum size width of latch and then use **fingering** to increase the width. We make layout in **CADENCE VIRTUOSO LAYOUT EDITOR** version 6.1.3.

## A.1 Sequence of Layouts

- Layout of minimum sized D-latch:
- Layout of 2\* minimum sized D-latch:
- Layout of 3\* minimum sized D-latch:
- Layout of 4\* minimum sized D-latch:
- Layout of 5\* minimum sized D-latch:



Figure A.1: Layout of minimum sized D-latch.



Figure A.2: Layout of 2\*minimum sized D-latch.

Appendix



Figure A.3: Layout of 3\*minimum sized D-latch.



Figure A.4: Layout of 4\*minimum sized D-latch.



Figure A.5: Layout of 5\*minimum sized D-latch.