# Survival of VLSI design – coping with device variability and uncertainty

Kevin Nowka Sr Mgr VLSI Systems IBM Austin Research Laboratory

Acknowledgements: Sani Nassif, Anne Gattiker (IBM Austin Research) Chandu Visweswariah, David Frank (IBM Watson Research), Lars Liebmann, Dan Maynard (IBM Server & Technology Group) Motivation for overcoming variation (or at least coping)?

What is at stake? The VLSI economy

to make these..

Very Large Scale Integration is:

to make these ..

Using greater than 10k of these..



to make these..







# The Secrets to this Success

Resilient CMOS VLSI Devices & Interconnect
 Cimple Design Presson

- Simple Design Processes
  - Physical Abstraction with small number of rules
    - Simple design and design migration
    - Composable designs
  - Functional Abstraction
  - Resulting predictable functional & timing behavior
    - Cell-based design, place & route, static timing
- Scaled Lithography (and Manufacturing Process Improvements)
  - Lithography improvements and the application of Dennard Scaling Rules enabling Moore's Law

# 65nm technology and beyond

- Is the VLSI Economy in jeopardy because of "variability?"
  - What is variability?
  - What are the important sources of variability?
  - What are the effects on VLSI design?
  - How are fundamental design processes impacted?
  - How can we cope?



# Variability and Uncertainty

- Variability: known quantitative relationship between design behavior (eg. current, delay, power, noisemargin, leakage, ...) and a source
  - Relationship can be accurately modeled, simulated, and compensated.
  - eg. Conductor thickness as function of interconnect density.
- Uncertainty: sources unknown or model too difficult/costly to generate or simulate
  - must be "budgeted" with some type of worst case analysis
  - eg. Vt as a function of dopant dose and placement
- Lack of modeling resources often transforms variability to uncertainty.
  - eg: deterministic circuit switching activity factor

# Some Classes of VLSI Variability

### Physical



Changes in characteristics of devices and wires (manufacturing & aging). Time scale: 10<sup>9</sup> sec (years).

#### Functional

Changes in characteristics due to application cycles or workload changes. Time scale: 10<sup>7</sup> to 10<sup>-6</sup> sec (execution time)

#### Environmental

Changes in supply voltage, temperature, local noise coupling. Time scale:  $10^{-3}$  to  $10^{-9}$  sec (clock tick).

### Informational

Lack of knowledge about design due to inadequate modeling. Time scale: ignorance cannot be measured in units of time.

| Texas Ad                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | &M 23 Oct 2007 | Slide 9 |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|---------|
| Lithography induced variability                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |                |         |
| <ul> <li>Subwavelength lithography</li> <li>Using 193nm light to create &lt; 30nm features</li> <li>29.5nm lines/spaces</li> <li>11/3/2005 15:02 11 1 500nm 11</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |                |         |
| Imperfect Process Control                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |                |         |
| <ul> <li>Critical Dimensions are sensitive to:</li> <li>focus</li> <li>dose (intensity and time)</li> <li>resist sensitivity (chemical variations)</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |                |         |
| <ul> <li>layer thicknesses</li> <li>Intensity affected by interference</li> <li>strongly dependent on layer thicknesses.</li> <li>Anti-reflection coatings help</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |                |         |
| <ul> <li>Frors in Alignment, Rotation and Magnificati</li> <li>Result in either global or local shape-dependent dependent depend</li></ul> |                | ions.   |





### Imperfect Process Control (cont'd)

- Pattern sensitivity.
  - Interference effects from neighboring shapes.
    - Predominantly in same plane
    - Some buried feature interference for interconnect



Texas A&M 23 Oct 2007

Slide 11



Texas A&M 23 Oct 2007

# Line-edge roughness





# **Physical Variation Effects: Circuit Performance** PSROs relative to reticle mean 05131SEA005.008 11% slower than mean 13% faster than mean On the same die! (-0.105,-0.0661] (-0.027,0.0121] (0.0512,0.0903] (-0.0661,-0.027] (0.0121,0.0512] (0.0903,0.129] Courtesy Anne Gattiker, IBM

#### Variation Effects: Not just ring oscillators....Real Microprocessors

- Multicore design -- Core-0 was found to be ~15% slower than other parts.
- Models predict all parts of the design are identical.



Texas A&M 23 Oct 2007

# **Random Dopant Fluctuation**

- Threshold Voltage is dependent upon the doping within a device channel area.
  - The number of dopant atoms in the depletion layer of a MOSFET has been scaling roughly as Leff<sup>1.5</sup>.
  - Statistical variation in the number of dopants, N, varies as N<sup>1/2</sup>, causing increasing Vt uncertainty for small N.



Source: D. Frank, et al, VLSI Tech 99, D. Frank, H. Wong IWCE, May 2000]

>200mV Vt Shift



### **NBTI and Hot-carrier-induced Variation**

#### Negative Bias Temperature Instability

- At high negative bias and elevated temperature the pFET Vt gradually shifts more and more negative (reducing the pFET current).
  - The mechanism is thought to be the breaking of hydrogensilicon bonds at the Si/SiO2 interface, creating surface traps and injecting positive hydrogen-related species into the oxide.
  - Associated with the <u>average NBTI shift</u>, there are also <u>random shifts</u>, which even for identical use conditions and devices, will cause mismatch shifts due to random variations in the number and spatial distribution of the charges/interface states formed.
- There are also other charge trapping and hotcarrier defect generation mechanisms that cause long-term Vt shifts in both nFETs and pFETs.
- Long-term Vt shifts are parameter variations that must be accounted for in the design of circuits. N. Rohrer, ISSCC 2006



Texas A&M 23 Oct 2007

#### Slide 19

# Back-end Variability -- CMP











>10% dynamic supply droop

Supply variation due to input variation (eg. battery lifecycle) and self-generated and coupled supply noise
 Supply variation affects performance, power, reliability



Thermal variation due to ambient fluctuation and self-heating

Thermal variation affects performance, reliability

# 45nm technology and beyond

- Is the VLSI Economy in jeopardy because of "variability?"
  - What is variability?
  - What are the important sources of variability?
  - What are the effects on VLSI design?
  - How are fundamental design processes impacted?
  - How can we cope?

# Revisiting....the Secrets to Success

## Resilient CMOS VLSI Devices & Interconnect

- Simple Design Processes
  - Physical Abstraction with small number of rules
    - Simple design concepts and design migration
    - Composable designs
  - Functional Abstraction
  - Resulting predictable functional & timing behavior
    - Cell-based design, place & route, static timing
- Scaled Lithography (and Manufacturing Process Improvements)
  - Lithography improvements and the application of Dennard Scaling Rules enabling Moore's Law





# What has changed?

### Resiliency & redundancy cannot be ignored.

Need to start design assuming partial functionality!



# 1980: Abstraction – the great enabler

With abundant performance, it became possible to abstract design to a few simple rules. Thus came the age of "chip computer science" and equality for all designers!



OCTOBER 20, 1981 ANNUAL TECHNOLOGY UPDATE ISSUE Programmable VLS forces software to the forefront, tering strategies in production, test, and equipment descer / 116



Texas A&M 23 Oct 2007

#### Physical Abstraction 2003: Abstract this! Technology has become so complex it is not well represented by "rules". 350 **# Design Rules** Rules developed to deal with defects Insufficient for capturing systematic, 300 statistical variability relations Maybe "migratable design" was just a dream after all..... 250 $\lambda = 365$ nm 200 $\lambda = 248$ nm 0.1 50 $\lambda = 193$ nm Technology 100 500nm 350nm 250nm 180nm 130nm 90nm

0.01 '86 '88 '90 '92 '94 '96 '98 '00 '02 '04 '06 '08 '10

# What has changed?

- Resiliency & redundancy cannot be ignored.
  - Need to start design assuming partial functionality!

## Mead-Conway design is dead...

- Physical abstraction is broken ground-rule explosion
- Physical abstraction is broken composability in jeopardy
- Functional abstraction is broken increasingly difficult to treat these as "logic devices"
- Transistor performance determined by new features and phenomena, .:. large variety in behaviors (not easily bounded).

### Key Factor: Variability

### Litho and Physical Abstraction ca. 1990 Before the advent of deep sub-wavelength lithography, the salient properties of a transistor were determined by geometries very local to the device itself! $\lambda = 365$ nm $\lambda = 248$ nm 0.1 1990 $\lambda = 193$ nm

0.01

'86 '88 '90 '92 '94

'96

'98 '00 '02

'04

'06

'08

'10

# Litho and Physical Abstraction ca. 2000

As scaling required resolution enhancement and optical proximity correction, the number of shapes that determine the final outcome increased.







# What has changed?

- Resiliency & redundancy cannot be ignored.
  - Need to start design assuming partial functionality!

### Mead-Conway design is dead...

- Physical abstraction is broken ground-rule explosion
- Physical abstraction is broken composability in jeopardy
- Functional abstraction is broken increasingly difficult to treat these as "logic devices"
- Transistor performance determined by new features and phenomena, .:. large variety in behaviors (not easily bounded).

### Key Factor: Variability







# What has changed?

- Resiliency & redundancy cannot be ignored.
  - Need to start design assuming partial functionality!

## Mead-Conway design is dead...

- Physical abstraction is broken ground-rule explosion
- Physical abstraction is broken composability in jeopardy
- Functional abstraction is broken increasingly difficult to treat these as "logic devices"
- Transistor performance determined by new features and phenomena, ... large variety in behaviors (not easily bounded).

Key Factor: Variability

## So now what?

Back to days of the "Hero Designer?"

Or Cope? -- fix the incomplete technology specification, modify the abstractions, validate the models, and change the design practices.

Sust how many Hero Designers are there in VLSI?

# Coping – part 1

- Know thine enemy: "You can fix what you can't measure"
  - Build structures to measure variation effects and causes – density & pattern sensitivities, CAA, threshold variation, matching....
  - Capture significant variation effects in models
  - In-situ variation sensing thru on-die monitor circuits
     thermal, performance ROs, supply, aging, …









Courtesy Anne Gattiker, IBM



# Coping – part 2

### Fix the abstraction and the design process

- Use modeled behavior to drive physical and functional abstraction
  - Incorporate sensitivities into physical abstraction eg. Raise the level of physical abstraction for cells
  - Incorporate sensitivities into timing abstraction eg. Statistical Static Timing
- Variation aware DA (placement, routing, buffer insert...)
- Recognize that rampant variability = defective
  - Test for the tails At Speed Scan Tests
  - Cut out the tails eg. SRAMs with Vt-induced Vmin issues should be mapped out with redundant row/columns
  - With 80 cores can't you just turn the worst one or two into decoupling capacitors?

## **Statistical Static Timing**

#### Path-based SSTA

- Conduct a nominal timing analysis
- Select a representative set of critical paths
- Model the delay of each path as a function of random variables (the underlying sources of variation)
- Predict the parametric yield curve, as well as generate diagnostics (integration of a feasible region in parameter space)

EinsStat (IBM tool) models all timing arcs and produces all timing results in the canonical 1st order form:





# Coping – part 3

### <u>"Bob and weave" – Adapt design for variation</u>

- If it's functional then adapt... to spatial/temporal variation
  - split/multiple supplies
  - body bias
  - DVFS
  - thermal throttling
  - power and performance efficiency-based job scheduling
- Does variation-induced timing variation warrant fundamental shift from synchronous systems to inherent timing adaptation?
  - Is 2X die-to-die, 50% within die variation sufficient?
  - If half of this is systematic and nullible, where do we spend our effort?



## Technology Trend For a Simple Buffer

- Simplest possible circuit (if this fails, everything else will).
  - Performed analysis for 90nm, 65nm and 45nm.
  - Clear trend in sigma!



Slide 54

SRAM is known to be a more sensitive circuit... (lower  $\sigma$ ).

- But, circuit optimized for each technology. (No redundancy included)
- Much lower  $\sigma$  values + similar trend in sigma!



#### Slide 55

## Impact of A/B/C Sigma on Chip Design

- The values of sigma determine:
  - Whether to build adaptation into the chip
  - Whether to include redundancy in the chip
  - The size of "yieldable" components on the chip
- Such activities are already routine in the design of SRAM.
  - But such techniques are not well developed for standard logic design...
  - Different technology sensitivities of SRAM vs. logic make the problem difficult



Slide 56

## **Ultimate Vision**

Get to the point where site-specific hardware-derived models are ubiquitously available... Enable accurate model to hardware correlation and sophisticated design adaptation.



# Summary Trends and Challenges

- Trends/Challenges
  - Variability increasing as Design/Manufacturing interface complexity rising.
    - More design rules, more 2<sup>nd</sup> order effects, more systematic variations, more correction steps...
  - Current techniques are insufficient
    - Abstractions no longer good enough
    - Predictability is poor
      - Ability to confidently bound performance is degrading.
      - Frequent model/hardware mismatch.

#### Required Action

- Better, targeted measurements through characterization structures
- Hardware-driven variation-enabled modeling
  - Corners not sufficient any more statistical timing
- Technology aware circuit and PD tools
  - Variation tolerance in design
  - Technology aware physical design, redundancy, adaptation.