EE-382M VLSI-II Early Design Planning: Front End **Spring 2017** Mark McDermott Jacob Abraham Gian Gerosa #### **TLAs** - EDP Early Design Planning - FE Front End - BE Back End - SOC System-on-Chip - SC Standard Cell - SDP Structured Datapath - PD Physical Design - STA Static Timing Analysis - .LIB STA Library - ABGEN Abstract Generator - APR Auto Place & Route - LEF Library Exchange Format - DEF Design Exchange Format - TTM Time to Money ### **Agenda** ### Early Design Planning (EDP) objectives #### EDP-FE Flow - Design partitioning - Area estimation - Block & Unit floorplanning - Block & Unit route planning - Chip level floorplanning - Chip level route planning - Chip & block level power estimation - Chip & block level timing estimation ### Summary # **EDP-FE Objectives** - Get designers thinking about physical implementation while doing the architecture design. - Avoids pitfalls that can cause die size growth, timing issues and power distribution problems. - Give designers a procedure to floorplan high performance SOCs. - Becomes the starting point for the chip plan iteration. - It is the starting point for block/unit/partition/cluster design by setting various constraints such as block size and placement, feed-through plan, power and clock distribution. # **Design Flow Paradigm** - HW/SW Architecture, μarchitecture, logic, floorplan, timing, power optimized concurrently. - Clusters and top level chip optimized in parallel - Top-down budgeting with bottom-up feedback - Forward constraint-driven (timing,...) and back-annotation (parasitic, area) # **Design Convergence Iteration Profile** # **Design Phases** # **Ideal SOC Design Environment** # **EE382M Design Flow** ### **Agenda** - Early Design Planning (EDP) objectives - EDP-FE Flow - Design partitioning - Area estimation - Block & Unit planning - Block & Unit route planning - Chip level floorplanning - Chip level route planning - Chip & block level power estimation - Chip & block level timing estimation - Summary ### **EDP-FE Flow** # **EDP-FE Flow (cont)** 1/24/17 EE382M-8 Class Notes Page 12 # **Partitioning: Building Blocks** - Three types of building blocks are used in a VLSI chip: - SC: Standard Cell Block - Typically synthesized using standard cell library - Layout is done using Automatic Place & Route (APR) tools - SDP: Structured Data Paths - Typically designed using DP libraries or the standard cell library used for SCs. - Layout is generated using tiling engines using relative placement constraints. - Routing can be done manually (for busses, clocks) or with automated routers or a mix of both. - Customs Macros: Memory arrays, Register Files, CAMs, PLLs, Thermal sensors, off-chip IO buffers, voltage regulators, etc. - Memory generators can be used; High performance arrays are typically done manually. Memory generators will produce layout. Custom designed memories will be done manually. - Semicustom design is also used: leaf cells are pure custom, but block can be built with AP&R tools. # **Partitioning: Logical vs. Physical Mapping** #### **Area Estimation** - Area estimation is accomplished using one of these three methods: - Scaling from previous design - Reasonably accurate method. - Modify area by direct multiplication of previous area by scaling factor - Scaling factor is determined by process technology group - Scaling factor may be be non-linear due to process scaling issues. - Manual estimation using spreadsheets - Least accurate method. - Requires estimating the number of logical elements that will be used. - Requires estimating size of hard macros - Synthesis of an existing design - Most reliable method. Not always possible during early design phase - Need to add "fudge" factor to accommodate future growth (or shrinkage) - Need to estimate SC utilization percentages. # **Block Size Estimation Spreadsheet Example** - The block area estimations are done using the same spreadsheet as the power estimation; since the project will use synthesis, then results can be obtained after APR steps. - The spreadsheet comprehends the following: - Area utilization factors for each gate type - Block utilization factors | Technology | <b>180</b> nm | | 130nm | | 90nm | | 65nm | | 45nm | | |------------|---------------|------------------------------------------------|---------|------------------------------------------------|---------|------------------------------------------------|---------|------------------------------------------------|---------|------------------------------------------------| | Cell Type | Area μ² | Typical SC<br>Density<br>gates/mm <sup>2</sup> | Area μ² | Typical SC<br>Density<br>gates/mm <sup>2</sup> | Area μ² | Typical SC<br>Density<br>gates/mm <sup>2</sup> | Area μ² | Typical SC<br>Density<br>gates/mm <sup>2</sup> | Area μ² | Typical SC<br>Density<br>gates/mm <sup>2</sup> | | INV | 26.8 | 28,006 | 9.7 | 77,160 | 3.5 | 204,082 | 1.3 | 562,266 | 0.5 | 1,549,099 | | 2-NAND | 28.6 | 26,247 | 11.9 | 63,131 | 4.9 | 145,773 | 2.1 | 350,619 | 0.9 | 843,326 | | 3-NAND | 26.4 | 28,447 | 16.2 | 46,296 | 10.0 | 72,333 | 6.1 | 117,721 | 3.8 | 191,589 | | 4-NAND | 43.9 | 17,068 | 27.0 | 27,778 | 10.0 | 72,333 | 3.7 | 196,201 | 1.4 | 532,190 | | 2-NOR | 28.6 | 26,247 | 11.9 | 63,131 | 4.9 | 145,773 | 2.1 | 350,619 | 0.9 | 843,326 | | 3-NOR | 81.0 | 9,259 | 22.7 | 33,069 | 6.4 | 113,379 | 1.8 | 404,924 | 0.5 | 1,446,157 | | 4-NOR | 76.7 | 9,780 | 29.2 | 25,720 | 11.1 | 64,935 | 4.2 | 170,771 | 1.6 | 449,105 | | DFFR | 103.7 | 7,229 | 44.3 | 16,938 | 18.9 | 38,095 | 8.1 | 89,252 | 3.4 | 209,104 | | SDFFR | 107.5 | 6,980 | 57.2 | 13,103 | 30.5 | 23,613 | 16.2 | 44,326 | 8.7 | 83,210 | | Average | | 17,696 | | 40,703 | | 97,813 | | 254,078 | | 683,012 | # **Block Size Estimation Spreadsheet Example** | UNIT: PUT YOUR BLOCK NAME | | | | | | | | | | | |---------------------------|--------------------------|------------------------------------------|------------------------------|--------------------|------------------------------------------------|--------------------------------------------|--|--|--|--| | GA | TES | | AREA CALCULATION | | | | | | | | | Gate | Min Sized<br>Transistors | Actual Gate<br>Count (USER<br>SPECIFIED) | Area per Logic<br>Gate ( μ2) | Utilization Factor | Total Area with<br>Utilization Factor (<br>μ2) | Transistor<br>Density<br>(Transistors/ μ2) | | | | | | inv | 3 | 2812 | 1.2 | 80.0% | 4218 | 2.00 | | | | | | buf | 8 | 1868 | 2.4 | 80.0% | 5604 | 2.67 | | | | | | triinv | 15 | 1 | 4.2 | 80.0% | 79 | 2.86 | | | | | | clk_buf | 22 | 487 | 5.9 | 80.0% | 3592 | 2.98 | | | | | | and2 | 11 | 676 | 3.2 | 75.0% | 2884 | 2.58 | | | | | | and3 | 18 | 35 | 4.7 | 70.0% | 235 | 2.68 | | | | | | and4 | 26 | 89 | 6.1 | 65.0% | 835 | 2.77 | | | | | | nand2 | 8 | 2285 | 2.9 | 75.0% | 8835 | 2.07 | | | | | | nand3 | 15 | 365 | 4.1 | 70.0% | 2138 | 2.56 | | | | | | nand4 | 24 | 21 | 6.8 | 65.0% | 220 | 2.29 | | | | | | nor2 | 10 | 1 | 2.9 | 75.0% | 12 | 2.59 | | | | | | nor3 | 21 | 73 | 4.1 | 70.0% | 428 | 3.59 | | | | | | nor4 | 36 | 57 | 6.8 | 65.0% | 596 | 3.44 | | | | | | xor2 | 30 | 264 | 6.9 | 75.0% | 2429 | 3.26 | | | | | | xnor2 | 30 | 169 | 7.2 | 75.0% | 1622 | 3.13 | | | | | | aoi3 | 16 | 1640 | 9.0 | 70.0% | 21086 | 1.24 | | | | | | aoi4 | 20 | 0 | 11.0 | 65.0% | 17 | 1.18 | | | | | | oai3 | 16 | 888 | 13.0 | 70.0% | 16491 | 0.86 | | | | | | oai4 | 20 | 12 | 17.0 | 65.0% | 314 | 0.76 | | | | | | or2 | 15 | 220 | 3.4 | 75.0% | 997 | 3.31 | | | | | | or3 | 23 | 23 | 4.7 | 70.0% | 154 | 3.43 | | | | | | or4 | 23 | 23 | 6.2 | 65.0% | 219 | 2.41 | | | | | | mux2 | 9 | 360 | 4.2 | 70.0% | 2160 | 1.50 | | | | | | imux2 | 15 | 360 | 6.6 | 70.0% | 3394 | 1.59 | | | | | | mux4 | 27 | 12 | 8.3 | 60.0% | 166 | 1.95 | | | | | | imux4 | 39 | 0 | 11.0 | 60.0% | 18 | 2.13 | | | | | | DFFR | 36 | 1339 | 8.1 | 55.0% | 19720 | 2.44 | | | | | | SDFFR | 56 | 320 | 16.2 | 55.0% | 9425 | 1.90 | | | | | #### **Estimated area calculation** ■ The block area estimates are determined by summing up the SC/SDP area calculations with the Memory area calculation. AVERAGE TRANSISTOR DENSITY 1.91 # **Block & Unit planning** - Block planning is used to determine: - Block pin placement - Including through cell routes - Hard macro placement - Aspect ratio of the block(s) # **Block & Unit route planning** - Block level route planning is used to: - Determine critical paths within a block - Determine the key pre-routes that need to be fed to the global router - Determine preliminary power grid routing # **Chip Level floorplanning** - Chip level floorplanning - Determine starting point(s) for block placement options - Determine aspect ratio option(s) for the chip # **Chip Level route planning** - Chip level route planning is used to: - Determine critical paths at the chip level - Determine the key pre-routes that need to be fed to the global router - Determine preliminary power grid routing at the top level of the chip: # **Chip & Block Level power estimation** - Power estimates for SC and SDP blocks are based on data from Design Compiler. - PTPX (Synopsys power analysis tool) will be used. - Final block/unit power should include the following: - Block activity factors - Clock power - Memory power - Logic gate intrinsic power - Transistor gate leakage power - Transistor gate capacitance power - Interconnect wiring capacitance power - Source-drain leakage power - Signal switching factors - Glitching or spurious activity power # **Activity Factor vs. Switching Factor** - Activity Factor represents how often a specific block is active - Represented as percentage of time - For example an instruction fetch unit is active 80-90% of the time where a debug unit would be active 1% of the time - Switching factor is also represented as a percentage and indicates how often the internal nodes of a specific block toggle - A function of the type of gate. - For example Inverters switch all the time - 4-input NAND gates switch considerably less - Complex gates have even lower switching factors. - Typical SC blocks have switching factors of about 15-25% depending on the mix of logic - Activity Factor is driven by the architecture - Switching Factor is driven by circuit topology # **Activity Factors (%)** ### **Clock Power Estimation** Clock Power involves determining gate capacitance for dynamic power and determining amount of tracks for a given floorplan area $$P_{Clock} = P_{static} + P_{diss}$$ $P_{diss} = (AF) (C_{total}) (V^2) (freq)$ $P_{static} = I_{leakage} * V_{DD}$ # **Memory Power Estimation** - Most power dissipation for an array occurs in bit-lines and sense amplifiers - Calculate total bitline capacitance {Metal2 bitline cap} + {junction cap} X {number of bitcells} - Calculate sense node capacitive load to include in power dissipation - For power dissipation, we use the approximation: Pdyn = $$\alpha$$ \* Ctotal \* VDD \* VDD \* frequency Where alpha is the "Activity Factor" $0 < \alpha < 1$ • Memory cells can contribute significant D.C. power due to leakage from many cells in standby; be sure to take into account # **Logic Gate Intrinsic Power** ``` P_{gate-intrinsic} = (T_{count}) (AF) (SF_{avg}) (C_{j-den}) (A) (V^2) (f) ``` $T_{count}$ = Total Min Size Transistors AF = Activity Factor $SF_{avq}$ = Average Switching Factor for whole block $C_{j-den}$ = Junction Capacitance for 65nm = 4.1 fF/um<sup>2</sup> A = Area of junction $V = V_{dd}$ f = Frequency # **Sources of Leakage Power** # **Transistor Gate Leakage Power** - Gate oxide thickness < 2nm</li> - Direct tunneling of charge carriers through gate oxide, causing gate leakage to increase - Was expected to grow by 500X / technology - Was dominant leakage component - Was expected to contribute more than 15% - 20% of total power - High-K gate dielectrics have alleviated this problem Gate leakage current as a function of gate and drain bias for an NMOS device. # **Transistor Gate Leakage Power** ``` P_{\text{gate-leakage}} = (T_{\text{count}}) (W_{\text{min}}) (L) (On_{\%}) (G_{\text{leakage}}) (V) ``` $T_{count}$ = Total Min Size Transistors $W_{min} = Minimum Width$ L = Minimum Length On<sub>%</sub> = Percent On = 50% (all blocks) $G_{leakage}$ = Gate Leakage for 65nm = 15.6 nA / um<sup>2</sup> $V = V_{dd}$ # **S-D Leakage Power** - Technology scaling causing 30% smaller dimensions, causing higher energy consumption, power dissipation - Voltage and Vt must both be scaled to contain power increase and maintain 30% gate delay reduction - Leakage increases with scaled Vdd and Vt ### **S-D Leakage Power** ``` P_{S-D-leakage} = (T_{count}) (W_{min})(S-D_{leakage}) (SE) (V) ``` ``` T_{count} = Total Min Size Transistors ``` $W_{min} = Minimum Width$ L = Minimum Length $S-D_{leakage} = Source-Drain Leakage = x.xx nA / um^2$ SE = Stack Effect $V = V_{dd}$ #### **Interconnect Power** Becoming large portion of power consumption given smaller technologies Up to 40% of total power Depends on physical info of layout / packing Assuming square model for block, interconnect length is estimated at Length / 5 #### **Interconnect Power** ``` P_{Intercon} = (Gate)(AF) (SF) (C_{avg wire-den}) (IL) (V^2) (f) ``` ``` Gate = Total Gate Count ``` AF = Activity Factor SF = Switching Factor C<sub>avg wire-den</sub> = Average Wire Capacitance for M1-M4 = 0.21 fF/um IL = Interconnect Length (Assume square block, divide by 5 $V = V_{dd}$ f = Frequency #### **Glitch Power** $$P_{glitch} = (15\%) (P_{gate} + P_{Intercon} + P_{gate-intrinsic})$$ - Spurious transitions before output approaches steady-state - Unequal propagation delays of input signals to gate - Can multiply as they propagate through combinational logic blocks generation - Smaller technology nodes make intercon delays more dominant, so delays are more uneven - False switching can account for 10%-60% of total power (arithmetic modules) # **Power Estimation Spreadsheet Example** # **SC and SDP Power Estimation Spreadsheet** | | | | | | | Р | POWER CALCULATION | | | | | | | | | AREA CALCULATION | | | | |--------------------|--------------------------|------------------------------------|--------------------|-------------------------------------|------------------------|------------------------------------------|----------------------------------|------------------------------|----------------------------|------------------------|----------------------------|---------------------------|-----------------------------|--------------------|------------------------------|-----------------------------------|--|--|--| | Actual Gate | | Actual Gate | | | | | | | | | Leakage | | 7.11.271.07 | | Total Area | | | | | | Gate | Min Sized<br>Transistors | Count<br>(IMPORT from<br>SYNOPSYS) | Stacking<br>Factor | Total Min Sized<br>Transistor Count | Total Gate<br>Cap (fF) | Average<br>Output<br>Switching<br>Factor | Transistor<br>Width<br>(microns) | Gate Intrinsic<br>Power (mW) | Interconnect<br>Power (mW) | Gate Power<br>(mW) | Gate Leakage<br>Power (mW) | S/D Leakage<br>Power (mW) | Area per Logic<br>Gate (μ2) | Utilization Factor | with Utilization Factor (µ2) | Transistor Den<br>(Transistors/ p | | | | | ADDF_B | 47 | 1 | 0.75 | 47 | 12.2 | 12% | 6.1 | 1.725E-04 | 2.693E-05 | 3.197E-04 | 2.478E-06 | 1.210E-04 | 6.1 | 75.0% | 8.1 | 5.77 | | | | | ADDF_C | 64 | 1 | 0.75 | 64 | 16.6 | 12% | 8.3 | 2.349E-04 | 2.693E-05 | 4.354E-04 | 3.375E-06 | 1.647E-04 | 8.3 | 75.0% | 11.1 | 5.77 | | | | | AND2_B | 14 | 1 | 0.75 | 14 | 3.6 | 12% | 1.8 | 5.138E-05 | 2.693E-05 | 9.524E-05 | 7.382E-07 | 3.604E-05 | 1.8 | 75.0% | 2.4 | 5.77 | | | | | AND2_C | 18 | 1 | 0.75 | 18 | 4.7 | 12% | 2.3 | 6.606E-05 | 2.693E-05 | 1.225E-04 | 9.491E-07 | 4.633E-05 | 2.3 | 75.0% | 3.1 | 5.77 | | | | | AND2_D | 28 | 1 | 0.75 | 28 | 7.3 | 12% | 3.6 | 1.028E-04 | 2.693E-05 | 1.905E-04 | 1.476E-06 | 7.207E-05 | 3.6 | 75.0% | 4.9 | 5.77 | | | | | AND3_B | 18 | 1 | 0.67 | 18 | 4.7 | 9% | 2.3 | 6.606E-05 | 2.020E-05 | 1.225E-04 | 9.491E-07 | 4.139E-05 | 2.3 | 70.0% | 3.3 | 5.38 | | | | | AND3_C | 22 | 1 | 0.67 | 22 | 5.7 | 9% | 2.9 | 8.074E-05 | 2.020E-05 | 1.497E-04 | 1.160E-06 | 5.059E-05 | 2.9 | 70.0% | 4.1 | 5.38 | | | | | AND3_D | 26 | 1 | 0.67 | 26 | 6.8 | 9% | 3.4 | 9.542E-05 | 2.020E-05 | 1.769E-04 | 1.371E-06 | 5.979E-05 | 3.4 | 70.0% | 4.8 | 5.38 | | | | | A021_B | 24 | 1 | 0.50 | 24 | 6.2 | 6% | 3.1 | 8.808E-05 | 1.347E-05 | 1.633E-04 | 1.265E-06 | 4.118E-05 | 3.1 | 70.0% | 4.5 | 5.38 | | | | | A021_C | 26 | 1 | 0.50 | 26 | 6.8 | 6% | 3.4 | 9.542E-05 | 1.347E-05 | 1.769E-04 | 1.371E-06 | 4.462E-05 | 3.4 | 70.0% | 4.8 | 5.38 | | | | | A021_D | 32 | 1 | 0.50 | 32 | 8.3 | 6% | 4.2 | 1.174E-04 | 1.347E-05 | 2.177E-04 | 1.687E-06 | 5.491E-05 | 4.2 | 70.0% | 5.9 | 5.38 | | | | | A022_B | 28 | 1 | 0.50 | 28 | 7.3 | 6% | 3.6 | 1.028E-04 | 1.347E-05 | 1.905E-04 | 1.476E-06 | 4.805E-05 | 3.6 | 70.0% | 5.2 | 5.38 | | | | | A022_C | 32 | 1 | 0.50 | 32 | 8.3 | 6% | 4.2 | 1.174E-04 | 1.347E-05 | 2.177E-04 | 1.687E-06 | 5.491E-05 | 4.2 | 70.0% | 5.9 | 5.38 | | | | | AOI21_A | 20 | 1 | 0.50 | 20 | 5.2 | 6% | 2.6 | 7.340E-05 | 1.347E-05 | 1.361E-04 | 1.055E-06 | 3.432E-05 | 2.6 | 60.0% | 4.3 | 4.62 | | | | | AOI21 B | 24 | 1 | 0.50 | 24 | 6.2 | 6% | 3.1 | 8.808E-05 | 1.347E-05 | 1.633E-04 | 1.265E-06 | 4.118E-05 | 3.1 | 70.0% | 4.5 | 5.38 | | | | | AOI21 C | 30 | 1 | 0.50 | 30 | 7.8 | 6% | 3.9 | 1.101E-04 | 1.347E-05 | 2.041E-04 | 1.582E-06 | 5.148E-05 | 3.9 | 70.0% | 5.6 | 5.38 | | | | | AOI22 A | 24 | 1 | 0.33 | 24 | 6.2 | 6% | 3.1 | 8.808E-05 | 1.347E-05 | 1.633E-04 | 1.265E-06 | 2.718E-05 | 3.1 | 65.0% | 4.8 | 5.00 | | | | | AOI22 B | 28 | 1 | 0.33 | 28 | 7.3 | 6% | 3.6 | 1.028E-04 | 1.347E-05 | 1.905E-04 | 1.476E-06 | 3.171E-05 | 3.6 | 65.0% | 5.6 | 5.00 | | | | | AOI22 C | 34 | 1 | 0.33 | 34 | 8.8 | 6% | 4.4 | 1.248E-04 | 1.347E-05 | 2.313E-04 | 1.793E-06 | 3.851E-05 | 4.4 | 65.0% | 6.8 | 5.00 | | | | | DFFR E | 80 | 1 | 0.64 | 80 | 20.8 | 12% | 10.4 | 2.936E-04 | 2.693E-05 | 5.442E-04 | 4.218E-06 | 1.757E-04 | 10.4 | 55.0% | 18.9 | 4.23 | | | | | DFFSR E | 92 | 1 | 0.64 | 92 | 23.9 | 12% | 12.0 | 3.377E-04 | 2.693E-05 | 6.259E-04 | 4.851E-06 | 2.021E-04 | 12.0 | 55.0% | 21.7 | 4.23 | | | | | DFFS E | 80 | 1 | 0.64 | 80 | 20.8 | 12% | 10.4 | 2.936E-04 | 2.693E-05 | 5.442E-04 | 4.218E-06 | 1.757E-04 | 10.4 | 55.0% | 18.9 | 4.23 | | | | | INVERT A | 2 | 1 | 1.00 | 2 | 0.5 | 28% | 0.3 | 7.340E-06 | 6.284E-05 | 1.361E-05 | 1.055E-07 | 6.864E-06 | 0.3 | 50.0% | 0.5 | 3.85 | | | | | INVERT B | 4 | 1 | 1.00 | 4 | 1.0 | 28% | 0.5 | 1.468E-05 | 6.284E-05 | 2.721E-05 | 2.109E-07 | 1.373E-05 | 0.5 | 80.0% | 0.7 | 6.15 | | | | | INVERT C | 6 | 1 | 1.00 | 6 | 1.6 | 28% | 0.8 | 2.202E-05 | 6.284E-05 | 4.082E-05 | 3.164E-07 | 2.059E-05 | 0.8 | 80.0% | 1.0 | 6.15 | | | | | INVERT D | 8 | 1 | 1.00 | 8 | 2.1 | 28% | 1.0 | 2.936E-05 | 6.284E-05 | 5.442E-05 | 4.218E-07 | 2.746E-05 | 1.0 | 80.0% | 1.3 | 6.15 | | | | | INVERT E | 10 | 1 | 1.00 | 10 | 2.6 | 28% | 1.3 | 3.670E-05 | 6.284E-05 | 6.803E-05 | 5.273E-07 | 3.432E-05 | 1.3 | 80.0% | 1.6 | 6.15 | | | | | INVERT F | 12 | 1 | 1.00 | 12 | 3.1 | 28% | 1.6 | 4.404E-05 | 6.284E-05 | 8.163E-05 | 6.327E-07 | 4.118E-05 | 1.6 | 80.0% | 2.0 | 6.15 | | | | | INVERT_H | 14 | 1 | 1.00 | 14 | 3.6 | 28% | 1.8 | 5.138E-05 | 6.284E-05 | 9.524E-05 | 7.382E-07 | 4.805E-05 | 1.8 | 80.0% | 2.3 | 6.15 | | | | | LATSR E | 40 | 1 | 0.60 | 40 | 10.4 | 28% | 5.2 | 1.468E-04 | 6.284E-05 | 2.721E-04 | 2.109E-06 | 8.237E-05 | 5.2 | 80.0% | 6.5 | 6.15 | | | | | MUX21 C | 32 | 1 | 0.63 | 32 | 8.3 | 3% | 4.2 | 1.174E-04 | 6.733E-06 | 2.177E-04 | 1.687E-06 | 6.864E-05 | 4.2 | 70.0% | 5.9 | 5.38 | | | | | MUX21_C | 40 | 1 | 0.63 | 40 | 10.4 | 3% | 5.2 | 1.468E-04 | 6.733E-06 | 2.721E-04 | 2.109E-06 | 8.580E-05 | 5.2 | 70.0% | 7.4 | 5.38 | | | | | MUX41 D | 64 | 1 | 0.50 | 64 | 16.6 | 3% | 8.3 | 2.349E-04 | 6.733E-06 | 4.354E-04 | 3.375E-06 | 1.098E-04 | 8.3 | 60.0% | 13.9 | 4.62 | | | | | NAND2 A | 8 | 1 | 0.50 | 8 | 2.1 | 12% | 1.0 | 2.936E-05 | 2.693E-05 | 5.442E-05 | 4.218E-07 | 1.373E-05 | 1.0 | 75.0% | 1.4 | 5.77 | | | | | NAND2_A<br>NAND2_B | 16 | 1 | 0.50 | 16 | 4.2 | 12% | 2.1 | 5.872E-05 | 2.693E-05 | 1.088E-04 | 4.218E-07<br>8.436E-07 | 2.746E-05 | 2.1 | 75.0% | 2.8 | 5.77 | | | | | NAND2_B<br>NAND2_C | 24 | 1 | 0.50 | 24 | 6.2 | 12% | 3.1 | 8.808E-05 | 2.693E-05 | 1.633E-04 | 1.265E-06 | 4.118E-05 | 3.1 | 75.0% | 4.2 | 5.77 | | | | | NAND2_C<br>NAND3 A | 15 | 1 | 0.33 | 15 | 3.9 | 8% | 2.0 | 5.505E-05 | 1.795E-05 | 1.033E-04<br>1.020E-04 | 7.909E-07 | 4.118E-05<br>1.699E-05 | 2.0 | 75.0%<br>50.0% | 3.9 | 3.85 | | | | | | 30 | 1 | 0.33 | 30 | 7.8 | 8% | 3.9 | 1.101E-04 | 1.795E-05<br>1.795E-05 | 2.041E-04 | 7.909E-07<br>1.582E-06 | 3.398E-05 | 3.9 | 70.0% | 5.6 | 5.38 | | | | | NAND3_B | | _ | | | 7.8<br>11.7 | | 5.9 | | | | | | | | 8.4 | | | | | | NAND3_C | 45 | 1 | 0.33 | 45 | | 8% | | 1.652E-04 | 1.795E-05 | 3.061E-04 | 2.373E-06 | 5.097E-05 | 5.9 | 70.0% | | 5.38 | | | | | NOR2_A | 10 | 1 | 0.50 | 10 | 2.6 | 9% | 1.3 | 3.670E-05 | 2.020E-05 | 6.803E-05 | 5.273E-07 | 1.716E-05 | 1.3 | 75.0% | 1.7 | 5.77 | | | | | NOR2_B | 20 | 1 | 0.50 | 20 | 5.2 | 9% | 2.6 | 7.340E-05 | 2.020E-05 | 1.361E-04 | 1.055E-06 | 3.432E-05 | 2.6 | 75.0% | 3.5 | 5.77 | | | | | NOR2_C | 30 | 1 | 0.50 | 30 | 7.8 | 9% | 3.9 | 1.101E-04 | 2.020E-05 | 2.041E-04 | 1.582E-06 | 5.148E-05 | 3.9 | 75.0% | 5.2 | 5.77 | | | | | NOR3_A | 21 | 1 | 0.33 | 21 | 5.5 | 7% | 2.7 | 7.707E-05 | 1.571E-05 | 1.429E-04 | 1.107E-06 | 2.378E-05 | 2.7 | 70.0% | 3.9 | 5.38 | | | | # **Total Power Calculations** | | Power (mw) | Percentage | |----------------------|------------|------------| | Memory Power | 0.00 | 0.0% | | Gate Leakage Power | 0.01 | 0.3% | | S-D Leakage Power | 0.50 | 12.5% | | Gate Intrinsic Power | 0.84 | 20.8% | | Gate Power | 0.46 | 11.4% | | Intercon Power | 1.77 | 43.7% | | Glitch Power | 0.46 | 11.4% | | Total Power | 4.05 | 100.0% | | | Area | Percentage | |------------------|-----------|------------| | LOGIC AREA | 78744.11 | 73.0% | | FLIP FLOP AREA | 29145.27 | 27.0% | | MEMORY AREA | 0.00 | 0.0% | | | | | | | | | | | | | | Total AREA ( μ2) | 107889.38 | 100.0% | | Memory Power Calculation | | | | | | | | | | | |--------------------------|-----------------|------|-------|---------------|------|--|--|--|--|--| | | Memory Activity | | | Standby Power | | | | | | | | # Arrays | Factor (%) | (mw) | Power | (mw) | (mw) | | | | | | | 1 | 25% | 0 | 0 | 0 | 0 | | | | | | # **Chip & Block Level timing estimation** - The frequency of any given processor will be determined by the slowest speed path. - In synchronous (i.e., clocked) processors, this is defined as the time necessary to complete the logic in each pipe stage. - Speed path components - State element launch time - Logic delay - Wire (RC) delay - State element setup time - Clock Uncertainty # **Typical Timing Closure Progression** # **Critical-Path Analysis (cont)** The frequency of any given processor will be determined by the slowest speed path. This is a simple example of the same micro-architecture over 3 CMOS generations: | CMOS TECHNOLOGY | | 45nm | | 32nm | | 22nm | | |--------------------------|-------|------|------|------|------|------|------| | NAND2 FO3 delay (ps) | | 17.5 | 35.0 | 13.1 | 23.6 | 10.5 | 16.8 | | relative scaling | | NA | 2.00 | 0.75 | 1.80 | 0.80 | 1.6 | | VDD (volts) | | 1.20 | 0.75 | 1.10 | 0.75 | 1.00 | 0.75 | | average gates/pipe stage | 22.0 | 385 | 770 | 289 | 520 | 231 | 370 | | sequentials | 3.0 | 53 | 105 | 39 | 71 | 32 | 50 | | RC | 1.0 | 18 | 35 | 13 | 24 | 11 | 17 | | skew + jitter | 3.0 | 53 | 105 | 39 | 71 | 32 | 50 | | design margin | 10.0% | 51 | 102 | 38 | 69 | 30 | 49 | | total cycle time (ps) | | 558 | 1117 | 419 | 754 | 335 | 536 | | Fmax (GHz) | | 1.79 | 0.90 | 2.39 | 1.33 | 2.99 | 1.87 | # **Determining Critical Speed Paths in Macro Blocks** #### Standard Cell Blocks: - The primary mechanism for determining the speed paths in synthesized logic will be using the timing tool in Design Compiler from Synopsys. - You still have to manually inspect the synthesis results to confirm that speed paths are real and not an artifact of poor synthesis scripts. #### Structured Data Paths These paths are determined by a combination of HSPICE and a standard timing tool like PRIMETIME (PT) from Synopsys. ### Custom Design: Memory & Hard Macros - Speed paths in custom design is done entirely with HSPICE. - For the class project we will be estimating the Memory delays using the Synopsys memory compiler. # **Critical Speed-Path Analysis in Verilog** - Look for State Elements in Verilog as endpoints to each speed path - Flops - Latches - Memory Arrays - Easiest thing is to follow the clock signal - always @(posedge clk or posedge rst) begin - Note that logic can be imbedded in the always@ statement - Beware of implicit flip-flops in memory arrays. - Note that speed-paths can traverse many levels of hierarchy and/ or many different modules - Different Verilog constructs will translate into different types of logic gates. # Summary - Early design planning during the FE design phase can have a significant impact on Schedule and Time-to MONEY - Making early architectural decisions is cheaper than making changes during the implementation phase - Can cause the cancellation of the project because of missed market windows - Easier to "converge" the design when constraints are fed forward and feedback is returned.