0% found this document useful (0 votes)
126 views70 pages

RTL To GDS

Uploaded by

inucumm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
126 views70 pages

RTL To GDS

Uploaded by

inucumm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

Synthesis

Synthesis with Design Compiler include the following main tasks:

 Reading in the design,


 Setting constraints,
 Optimizing the design,
 Analyzing the results
 Saving the design database.

Reading in the design

The first task in synthesis is to read the design into Design Compiler memory. Reading in an HDL
design description consist of two tasks: analyzing and elaborating the description. The analysis
command (analyze) performs the following tasks:

 Reads the HDL source and checks it for syntactical errors.


 Creates HDL library objects in an HDL-independent intermediate format and saves these
intermediate files in a specified location (./SYN/WORK/ in this tutorial).
 If the analysis reports errors, they must be fixed, and the design reanalyzed before
continuing.
 The elaboration command (elaborate) does e.g. the following:
Translates the design into a technology-independent design (GTECH) from the
intermediate files produced during analysis.
 Allows changing of parameter values (generics) defined in the source code
 Replaces the HDL arithmetic operators in the code with DesignWare components
 Automatically executes the link command, which resolves design references

Note: check the elaboration reports carefully to see the number and the type of memory elements
Design Compiler thinks it should infer and whether you agree with it or not. Badly modeled
hardware description may result as excessive or wrong type of memory elements inferred.

At this point, if the elaboration completed successfully, the design is represented in GTECH
format, which is an internal, equation-based, technology-independent design format.
Constraining the design

The next task is to set the design constraints. Constraints are the instructions that the designer gives
to Design Compiler. They define what the synthesis tool can or cannot do with the design or how
the tool behaves. Usually this information can be derived from the various design specifications
(e.g. from timing specification).

There are basically two types of design constraints

Design Rule Constraints

Design rules constraints are implicit constraints which means that they are defined by the ASIC
vendor in technology library. By specifying the technology library that Design Compiler should
use, you also specify all design rules in that library. You cannot discard or override these rules.

Optimization Constraints

Optimization constraints are explicit constraints (set by the designer). They describe the design
goals (area, timing, and so on) the designer has set for the design and work as instructions for the
Design Compiler how to perform synthesis.

Design rule constraints comprise:

Maximum transition time

Longest time allowed for a driving pin of a net to change its logic value

Maximum fanout

Maximum fanout for a driving pin

Maximum (and minimum) capacitance

The maximum (and minimum) total capacitive load that an output pin can drive. The total
capacitance comprises of load pin capacitance and interconnect capacitances.
Cell degradation

Some technology libraries contain cell degradation tables. The cell degradation tables list the
maximum capacitance that can be driven by a cell as a function of the transition times at the inputs
of the cell.

The optimization constraints comprise timing and maximum area constraints. The most common
timing constraints include:

System clock definition and clock delays.

Clock constraints are the most important constraints in your ASIC design. The clock signal is the
synchronization signal that controls the operation of the system. The clock signal also defines the
timing requirements for all paths in the design. Most of the other timing constraints are related to
the clock signal.

Multicycle paths

A multicycle path is an exception to the default single cycle timing requirement of paths. That is,
on a multicycle path the signal requires more than a single clock cycle to propagate from the path
start point to the path endpoint.

Input and output delays.

Input and output delays constrain external path delays at the boundaries of a design. Input delay is
used to model the path delay from external inputs to the first registers in the design. Output delay
constrain the path from the last register to the outputs of the design.

Minimum and maximum path delays

Minimum and maximum path delays allow constraining paths individually and setting specific
timing constraints on those paths.

Input transition and output load capacitance

These constraints can be used to constrain the input slew rate and output capacitance on input and
output pins.
False paths

A false path is a path that cannot propagate a signal. For example, a path that is not activated by
any combination of inputs is a false path.

Note that Design Compiler tries to meet both design rule and optimization constraints but design
rule constraints always have precedence over the optimization constraints. This means that Design
Compiler can violate optimization constraints if necessary to avoid violating design rule
constraints.

Examples that follow show how to set these constraints.

Defining Design Environment

You also need to describe the environment in which the design is supposed to operate. The design
environment description includes:

Defining Operating Conditions

The operating conditions considers the variations in process, voltage, and temperature (PVT)
ranges a design is expected to encounter. These variations are taken into consideration with
operating condition specifications in the technology library. The cell and wire delays are scaled
according to these conditions.

Modeling Wire Loads

Wire load models are used to estimate the effect of interconnect nets on capacitance, resistance,
and area before real data is obtained from the actual layout. These models are statistical models
and they estimate the wire length as a function of net's fanout.

Optimizing the Design

The following section presents the behavior of Design Compiler optimization step. The
optimization step translates the HDL description into gate-level netlist using the cells available in
the technology library. The optimization is done in several phases. In each optimization phase
different optimization techniques are applied according to the design constraints. The following is
somewhat simplified description of optimizations performed during synthesis. DCUG; p. 266-->
Design Compiler performs optimizations on three levels: architectural, logic-level, and gate-level.
Architectural Optimizations

Architectural optimizations are high-level optimizations which are performed on the HDL
description level. These optimizations include tasks such as:

Arithmetic Optimizations

Arithmetic optimization uses the rules of algebra to improve the implementation of the design.
That is, Design Compiler may rearrange the operations in arithmetic expressions according to the
constraints to minimize the area or timing.

Resource Sharing

Resource sharing tries to reduce the amount of hardware by sharing hardware resources with
multiple operators in your HDL description. For example, single adder component may be shared
with multiple addition operators in the HDL code. Without resource sharing each operator in your
code will result as a separate HW component in the final circuitry.

Selecting DesignWare Implementations

Selecting a DesignWare implementation means that the implementation selection of a particular


resource is left to the Design Compiler. For example, the Basic IP Library contains two
implementations (ripple and carry-look ahead) for the +-operator (the DesignWare Foundation
Library provides more implementations for the '+' and other operators). When selecting
DesignWare implementation, Design Compiler considers all available implementations and makes
it selection according to your constraints.

At this point, the design is represented by GTECH library parts (i.e. generic, technology-
independent netlist).

Logic-level Optimizations

Logic-level optimizations are performed on GTECH netlist and consists of two processes:
structuring and flattening.

Structuring

Structuring evaluates the design equations represented by the GTECH netlist and tries by using
Boolean algebra to factor out common subexpressions in these equations. The subexpressions that
have been identified and factored out can then be shared between the equations. For example,
before structuring After Structuring.

P = ax + ay + c P = aI + c

Q=x+y+zQ=I+z

I=x+y

Structuring is usually recommended for designs with regular structured logic.

Flattening

Flattening tries to convert logic into two-level, Sum-of-Products representation. Flattening


produces fast logic (by minimizing the levels of logic between the inputs and outputs) at the
expense of the area increase. Flattening is recommended for designs containing unstructured or
random logic.

Gate-level Optimizations

Gate-level optimizations work on the technology-independent netlist and maps it to the library
cells to produce a technology-specific gate-level netlist. Gate-level optimizations include the
following processes:

Mapping

This process maps the cells from technology-independent netlist (GTECH) to the cells in library
specified by the target_library variable.

Delay Optimization

Delay optimization fixes the timing violations introduced by mapping phase.

Design Rule Fixing

Design rule fixing fixes the design rule violations in the design. Basically this means that Design
Compiler inserts buffers or resizes existing cells. Note that design rule fixing phase is allowed to
break timing constraints.
Area Optimization

Area optimization is the last step that Design Compiler performs on the design. During this phase,
only those optimizations that don't break design rules or timing constraints are allowed.

Note: the optimizations Design Compiler performs (or does not perform) depend on the constraints
you set. Therefore, setting realistic constraints is one of the most important synthesis tasks.

Reporting and Analyzing the Design

Once the synthesis has been completed, you need to analyze the results. Design Compiler provides
together with its graphical user interface (Design Vision) various means to debug the synthesized
design. These include both textual reports that can be generated for different design objects and
graphical views that help inspecting and visualizing the design.

There are basically two types of analysis methods and tools:

Generating reports for design object properties

Reporting commands generate textual reports for various design objects: timing and area, cells,
clocks, ports, buses, pins, nets, hierarchy, resources, constraints in the design, and so on.

Visualizing design objects (Design Vision)

Some design objects and their properties can be analyzed graphically. You may examine for
example the design schematic and explore the design structure, visualize critical and other timing
paths in the design, generate histograms for various metrics and so on.

These methods and tools are used to verify that the design meets the goals set by the designer and
described with design constraints. If the design does not meet a design goal then the analysis
methods can help determining the cause of the problem.

Save Design

The final task in synthesis with Design Compiler is to save the synthesized design. The design can
be saved in many formats but you should save for example the gate-level netlist (usually in
Verilog) and/or the design database. Remember that by default, Design Compiler does not save
anything when exiting.
Logic synthesis transforms RTL code into a gate-level netlist I RTL Verilog converted into
Structural Verilog

The process and steps

1. Translation
 Check RTL for valid syntax.
 Transform RTL to un-optimized generic (GTECH) gates.
 Parameters applied.
 Replace arithmetic operators with DesignWare components.
 Link all the parts of the design.
2. Optimization and Mapping
 Optimization and Mapping driven by constraints and cell library
 Choose best DesignWare implementation
 Factor out common logical sub-expressions and share terms
 Flatten logic into 2-level realization - hurts area
 Map logic into best fit implementation for given cell library
 Iterate to find ”best” realization
Floorplanning:
Floorplanning includes macro/block placement, pin placement, power planning, and power grid
design. What makes the job more important is that the decisions taken for macro/block placement,
I/O-pad placement, and power planning directly or indirectly impact the overall implementation
cycle.

Lots of iterations happen to get an optimum floorplan. The designer takes care of the design
parameters, such as power, area, timing, and performance during floorplanning. These estimations
are repeatedly reviewed, based on the feedback of other stakeholders such as the implementation
team, IP owners, and RTL designers. The outcome of floorplanning is a proper arrangement of
macros/blocks, power grid, pin placement, and partitioned blocks that can be implemented in
parallel.

The first rule of thumb for floorplanning is to arrange the hard macros and memories in such a
manner that you end up with a core area square in shape. This is always not possible, however,
because of the large number of analog-IP blocks, memories, and various other requirements in
design.
Pad limited design:

The Area of pad limits the size of die. No of IO pads may be lager. If die area is a constraint, we
can go for staggered IO Pads.

Core limited design:

The Area of core limits the size of die. No of IO pads may be lesser. In these designs In line IOs
can be used.

The following are decided at the floorplanning stage,

1. Die size, core size of the chip.


2. Macro placement.
3. I/O pad.
4. Plan for power.
5. Row configuration.

In simple words, power planning and macro placement together is known as floor planning. Apart
from this aspect ratio of the core, utilization of the core area, cell orientation and core to I/O
clearance are also be taken care during the floor plan stage.

Inputs to Floorplan Outputs from Floorplan


Synthesized netlist(.v, .vhdl) Die/Block area
Design Constrains (SDC) I/O pad/placed
Physical partitioning information of the design Macro placed
IO placement file(optional) Power grid design
Macro placement file(optional) Power pre-routing
Floorplanning control parameters Standard cell placement areas.
Floorplan Control Parameters

The following are the Control Parameters during Floorplan

1. Aspect Ratio:

Core Utilization

Aspect ratio (H/W)

Row/Core ratio

2. Width and Height:

Width

Height

Row/Core ratio
1. Aspect Ratio (Ar):

Aspect ratio is the ratio between vertical routing resources to horizontal routing resources.
If you specify a ratio of 1.00, the height and width are the same and therefore the core is a
square. If you specify a ratio of 3.00, the height is three times the width.

( )
( )=
( )

2. Core Utilization (Cu):

It indicates the amount of core area used for cell placement. The number is calculated as a
ratio of the total cell area (for hard macros and standard cells or soft macro cells) to the
core area. A core utilization of 0.8, for example, means that 80% of the core area is used
for cell placement and 20 percent is available for routing.

( )=
+

3. Row to Core Ratio (Rcr):

It indicates the amount of channel space to provide for routing between the cell rows. The
smaller the number, the more space is left for routing. A value of 1.0 leaves no routing
channel space.

=
( ∗ )
Total utilization T (F) of floorplan F is derived using the following equation:

T (F) = A (m) + A (p) + A (s)

Where, A (m) = Area occupied by macros.

A (p) = Area occupied by Pads/ Pad.

A (s) = Area occupied by Standard Cells.

Cell row utilization C (F) of floorplan F is approximated using the following equation:

C (F) = A(s) / A(R-union (B, E, m, p)

Where, R= All cell rows,

B= All placement blockages,

E= Exclusive Regions.

Macro Placement:

Macros placement is done manually based on the connectivity with other macros and also with I/O
pads are used for placing macros manually Fly/flight lines are virtual connections between macros
and also macros to I/O pads. This helps the designer to get an idea about the logical connections
between macros and pads. Fly/flight lines act as guidelines to the designer to reduce the routing
resources to be used.

 First check flylines i.e. check net connections from macro to macro and macro to standard
cells.
 If there is more connection from macro to macro place those macros nearer to each other
preferably nearer to core boundaries.
 If input pin is connected to macro better to place nearer to that pin or pad.
 If macro has more connection to standard cells spread the macros inside core.
 Avoid crisscross placement of macros.
 Use soft or hard blockages to guide placement engine.

Based on these connections macros and I/O Pads/pins, Fly lines are 3 types.

1. Macro to Macro flylines.


2. Macro to I/O flylines.
3. Pin to Pin flylines.

1. Macro to Macro flylines:


As shown in below figure (1), when two macros are selected for macro to macro flylines,
the total number of connections between them are shown this gives an idea to the
designer about which two macros to be placed closer. Hence, the two macros are placed
closer to each other as shown in below figure (1) (Right side)
Figure (1): Macro to Macro flylines
2. Macro to I/O flylines:

Figure (2): Macro to I/O Flylines


As shown in above figure (2), when macro to I/O port Pin flylines are selected, the total
number connections between macro and IO pins are shown. This gives an idea to the
designer to identify the macros to be kept at the corners of the die or block. Hence the
macro is placed closer to the periphery.

3. Pin to Pin flylines:


If two macros are selected for pin to pin fly lines, the virtual connections are shown and
the much preciously connection to exact pin to pin will be shown. This guides the designer
to choose an appropriate cell orientation (figure (3)) for the macros and as a resultant will
be efficient routing.

Figure 3 (a): Pin to Pin Flyline

Figure 3 (b): Cell Orientation (MY 90) based on Pin to Pin Flyline
Tie Cells Insertion.

Tie Cells:

Tie cells are special purpose standard cells whose output is Constant High or Constant Low.
These cells are used to hold (tie) the input of other cells which are required to be connected
Constant High (Vdd) or Constant Low.

Tie High Cell:

Tie High Cell is special purpose standard cell whose output is Constant High (Vdd).

Tie Low Cell:

Tie Low Cell is special purpose standard cell whose output is Constant Low (Vss).

There are some unused inputs in the design netlist. These unused inputs should not be floated.
They should be tied to either Power (Vdd) or Ground (Vss). The inputs which are required to
connect Vdd, connect to Tie High cells. The inputs which are required to connect Vss, connect to
Tie Low cells. This is the purpose of Tie cells in the design.

Why Tie cells are inserted?

In lower technology nodes the gate oxide of the transistor is so thin and sensitive to voltage
fluctuations in the power supply. If the gate of the transistor is directly connected to the
Power/Ground network (Power Grid Network), the gate oxide of the transistor might be damaged
due to voltage fluctuations in the power supply. To overcome this disadvantage, Tie cells are
inserted.

To perform Placement Optimization or Physical optimization, Automatic Insertion and


Optimization of Tie offs are required in the design. The following commands are used to execute
the same.

set_auto_disable_drc_nets -constant false

set_physopt_new_fix_constants true

set_attribute [...] max_fanout 12

set_attribute [...] max_capacitance 0.2 -type float


Tie-off Optimization:

Tie cells optimization means using a tie cell to hold (tie) as many inputs as possible at given
logic level, while meeting specified maximum Fanout and maximum capacitance constraints
(Logical DRCs)

The set_auto_disable_drc_nets command enables DRC on constant nets.

The set_physopt_new_fix_constants variable to true causes placement tool to observe the


maximum capacitance constraint during tie-off optimization

The maximum capacitance constraint is determined by the max_capacitance attribute, which can
be set with the set_max_capacitance or set_attribute command. The set_attribute command can
be used to specify explicitly both the maximum fanout and maximum capacitance constraints for
objects in the design.

NOTE: We can also insert tie cells manually with the command connect_tie_cells The command
inserts tie cells and connects them to specified cell ports, while meeting maximum fanout and
maximum wire length specified in the command.

Blockages

Blockages are specific locations where placing of cells are prevented or blocked. These act as
guidelines for placing standard cells in the design. Blockages will not be guiding the placement
tool to place standard cell at some particular area, but it won’t allow placement tool to place
standard cells at specified locations. This way blockages are act as guidelines to the placement
tool.

Blockages are of following types

 Soft (Non-Buffer) Blockage.


 Hard (std cell) Blockage.
 Partial Blockage.
 Partial Blockages.
 Placement Blockages.
 Routing Blockages.
Soft Blockages:

Soft Blockage specifies a region where only buffers can be placed. That means standard cells
cannot be placed in this region. It blocks (prevents) the placement tool from placing non-buffer
cells such as standard cells in this region.

Hard Blockages:

Hard blockage specifies a region where all standard (std) cells and buffers cannot be placed. It
prevents the placement tool from placing std cells and buffers in this region.

Hard blockages are mostly used to,

 Block std cells to certain regions in the design


 Avoid routing congestion at macro corners
 Control power rail generations at macro corners

Partial Blockages:

The blockage factor for any blockage is 100% by default. So no cells can be placed in that region,
but the flexibility of blockages can be chosen by Partial Blockages. To reduce placement density
without blocking 100% of area, changing the blockage factor of an existing region to flexible value
will be a better option.

Placement Blockage:

Placement blockage prevent the placement tool from placing cells at specific regions. Placement
blockages are created at floor planning stage.

Placement blockages are used to,

 Define standard cells and Macro Area


 Reserve channels for buffer insertion
 Prevent cells from being placed nearer to macros
 Prevent congestion near macros
Routing Blockage:

Routing blockages block routing resources on one or more layers. It can be created at any point in
the design.

HALO (Keep-Out Region):

i. HALO is the region around the boundary of fixed macro in the design in which no other
macro or std cells can be placed. Halo allows placement of buffers and inverters in its area.
ii. Halos of two adjacent macros can be overlap.
iii. If the macros are moved from one place to another place, Halos will also be moved. But in
the case of blockages if the macros are moved from one place to another place the
blockages cannot be moved.

Power Planning - Power Network Synthesis (PNS)

In ICC Design Planning flow, Power Network Synthesis creates macro power rings, creates the
power grid. PNS automates power topology definition, Calculations of the width and number of
power straps to meet IR constraints, detailed P/G connections and via placement.

There are two types of power planning and management. They are core cell power management
and I/O cell power management. In core cell power management VDD and VSS power rings
are formed around the core and macro. In addition to this straps and trunks are created for macros
as per the power requirement. In I/O cell power management, power rings are formed for I/O
cells and trunks are constructed between core power ring and power pads. Top to bottom approach
is used for the power analysis of flatten design while bottom up approach is suitable for macros.

The power information can be obtained from the front end design. The synthesis tool reports static
power information. Dynamic power can be calculated using Value Change Dump (VCD) or
Switching Activity Interchange Format (SAIF) file in conjunction with RTL description and
test bench. Exhaustive test coverage is required for efficient calculation of peak power. This
methodology is depicted in Figure (1).

For the hierarchical design budgeting has to be carried out in front end. Power is calculated from
each block of the design. Astro works on flattened netlist. Hence here top to bottom approach can
be used. JupiterXT can work on hierarchical designs. Hence bottom up approach for power
analysis can be used with JupiterXT. IR drops are not found in floor planning stage. In placement
stage rails are get connected with power rings, straps, trunks. Now IR drops comes into picture
and improper design of power can lead to large IR drops and core may not get sufficient power.

Figure (1) Power Planning methodology

Here I am going to discuss about the Calculations of the width and number of power straps to meet
EM IR constraints. Suppose consider core voltage Vdd core = 1.2volts.

Using below mentioned equations we can calculate vertical and horizontal strap width and required
number of power straps.

1. Calculation of block currents w.r.t to power:

Where Pblock = Block Power,


Vdd core = Core Voltage

2. Calculation the current supply from each side of the block:

= = { [ ]}
( )
3. Calculation of power-strap width based on EM:

W strap_vertical( = W strap_top= W strap_bottom) = Itop/ J metal

W strap_horizontal( = W strap_left= W strap_right) = Ileft/ J metal

4. Calculation of strap width based on IR drop dominates:

Wstrap_vertical ≥ (Itop x Roe x Hblock) / 0.1Vdd

Wstrap_horizontal ≥ (Ileftx Roe x Wblock) / 0.1Vdd

5. Partition the power straps into power refreshes:

For better utilization of the routing channels, select a refresh width of


(3 routing pitch + minimum metal6 width) = (3 x 0.59 μm + 0.25 μm) = 2.01μm 2 μm in
the vertical and the same in the horizontal.

Block A as an example, the number of the Vdd/Vssrefresh is:


Nrefresh_horizontal= Wstrap_ horizontal/ Wrefresh
Nrefresh_vertical= Wstrap_vertical / Wrefresh
The spacing of each refresh would be:
Srefresh_horizontal= Hblock/ Nrefresh_horizontal
Srefresh_vertical = Wblock/ Nrefresh_vertical

6. Calculate the required number of core power/ground pads:


If each power/ground pad can sustain 25 mA current, Pcore=630mw
Npad_core = (Pcore/ Vddcore) / Icore_power_pad

= (630/1.2)/25
= 21

7. Core Power Estimation


The following equation provides a simple method to estimate the dynamic power and
leakage power of combinational cells in the core area:

Pdynamic= Pcore x Scomb

Where, Pcomb. is the power per MHz per gate count


F is the working frequency.
Scomb is the switching activity of combinational logic
Ncomb. is the number of gate counts

Pstatic= Pleakagex Ncomb

Where, Pleakage is average leakage power of gate


Ncomb. is the number of gate counts.

Consider,

Gate count of combinational logic is 160K gates


The working frequency is 27MHz
Switching activity is 0.2

Then, the dynamic power consumption in the combinational circuit is,

Pdynamic = Pcore x F x Scomb x NcombPdynamic

= 12.35 nW/MHz X 27 X 0.2 * 160K

= 10.67 mW

The leakage power consumption in the combinational logic

Pstatic= Pleakage x NcombPstatic

= 0.756 nW X 160K

= 0.121 mW
1. A netlist consisting of 500k gates and I have to estimate die area and floorplanning.
How do I go about it?

Ans: There are 2 methods to estimate die area

Method 1:

Each cell has got its area according to a specific library. Go through all your cells and multiply
each cell in its corresponding area from your vendor's library. Then you can take some density
factor - usually for a standard design you should have around 80% density after placement. So
from this data you can estimate your required die area.

Method 2:

One more way of doing it is, Load the design in the implementation tool, try to change the
floorplan ( x & y coordinates ) in a such a way that the Starting utilization will be around 50%
-to- 60%. Again, it depends on the netlist quality netlist completion status (like Netlist is 75%,
80% & 90% completed).

2. How to do floor planning for multi Vdd designs?

Ans: First we have to decide about the power domains, and add the power rings for each
domain, and add the stripes to supply the power for standard cells.

3. What is core utilization percentage?

Ans: Core utilization percentage indicates the amount of core area used for cell placement.
The number is calculated as a ratio of the total cell area (for hard macros and standard cells or
soft macro cells) to the core area. A core utilization of 0.8, for example, means that 80% of the
core area is used for cell placement   and 20 percent is available for routing.

4. When core utilization area increased to 90%, macros got placed outside core area so
does it mean that increase in core utilization area decreases width and height?
Ans: If you go on with 90% then there may be a problem of congestion and routing problem.
It means that you can’t do routing within this area. Sometimes you can fit within 90%
utilization but while go on for timing optimization like upsize and adding buffers will lead to
increase in size. So in this case you can’t do anything so we need to come back to
floorplan again. So to be on safer side we are fixing to 70 to 80% utilization.

5. Why do we remove all placed standard cells, and then write out floorplan in DEF
format. What's use of DEF file?

Ans: DEF deals only with floorplan size. So to get the abstract of the floorplan, we are doing
like this. Saving and loading this file we can get this abstract again. We don’t need to redo
floorplan.

6. Can area recovery be done by downsizing cells at path with positive slack?

Ans: Yes, Area recovery can be done by downsizing cells at path with positive slack. Also
deleting unwanted buffers will also help in area recovery.

7. We can manipulate IR drop by changing number of power straps. I increased power


straps which reduced IR drop, but how many power straps can I keep adding to
reduce IR drop? How to calculate number of straps required. What problems can
arise with increase in number of straps?

Ans: We can use tools to calculate IR drop (ex:- Voltagestrom, Redhawk) if drop is high.
Based on that we can add straps. But if you do projects repeatedly you will come to know that
this much straps is enough. In this case you will not need tools. Its having calculation but its
not accurate its an approximate one. Number of straps will create problem in routing also it
affects area. So results will be in routing congestion. To number of power straps required for
a design

8. aprPGConnect, is used for logical connection of all VDD, VSS nets of all modules. So
how do we connect all VDD, VSS to global VDD /VSS nets before placement?

Ans: aprPGConnect, is used for logical connection of all VDD, VSS nets of all modules For
physical connection you can use the axgCreateStandardcellRails command to create the
standard cell rails and through them connect to the rings or the straps depending upon power
delivery design.

9. A design has memory and analog IP. How to arrange power and ground lines in floor-
plan. Is it separate digital and analog power lines? It is important to design power-
ground plan on ASIC?

Ans: Basically you have to make sure to keep analog and digital rails isolated from one
another. All hard macro and memory blocks need to have a vdd/vss pair ring around them.
Memories are always on the side or corners of your chip. Put a pair of vdd/vss ring around
your design. This is usually called core power ring Create a pair of vertical vdd/vss every 100
micron. This is called the power straps and on either side taps into the core power ring. Put a
pair of vdd/vss around every analog block and strap these analog rings (using a pair of vdd/vss)
and run them to your package vdd/vss rings

Keep in mind that in every place a digital vdd/vss crosses analog vdd/vss straps, then you need
to cut the digital vdd/vss on either side of the analog crossing to isolate the analog from digital
noise. You need to dedicate pins on your chip for analog power and ground. Now we come to
the most time consuming part of this, HOW THICK SHOULD YOU MAKE all these
rings/straps. The answer is this is technology dependent. Look into the packaging
documentations, they usually have guidelines for how to calculate the thickness of you power
rings. Some even have applications that calculate all this for you and makes the cuts for
analog/digital crossings.

10. In my design, core PG ring and strips were implemented by M6/M7,and strips in
vertical orientation is M6.I use default method to connect M6 strips to stand cell
connection,M1,the vias from V12,V23,.. to V56 will block the routing of M2,..M6, it
will increase congestion to some extent. I want to know is there any good method to
avoid congestion when add strips or connect strips to standard cell connection?

Ans: Synopsys ICC, there was a command controlling the standard cell utilization under
power straps. Using this you can have some sort of channels passing through stacked vias,
between standard cells. This limits the detours done because of these stacked vias and allows
more uniform cell placement resulting and a reduced congestion. in Soc Encounter, The
command setPrerouteAsObs can be used to control standard cell density under power strips.
But the 100% via connection from M1 to M6 under wide strip metal still block other nets'
routing.

11. How to control via generation when do special route for standard cell, such as how to
reserve gaps between vias for other net routing?

Ans: To remove those stack vias you need to

I. Either returns back to floorplan step, where power straps and power/ground preroute
vias are dropped. Normally vias are dropped regularly to reduce power & ground
resistance; therefore maximum numbers of vias are dropped over power/ground nets.
Therefore you need to check your floorplan scripts. They should be after horizontal
& vertical power strap generation at M6 & M7.
II. If the vias to be removed are at specific regions you can delete them at any step, but
before global routing of course to allow global route be aware of
resources/obstructions. In this case as you'll increase the power/ground resistance you
should confirm this methods validity by IR Drop analysis.
III. If IR Drop is an issue, another option would be placing standard cell placement
percentage blockages (Magma has percentage blockages which is good at reducing
blockages). This is the safest method as you don't need to delete those stacked M1-to-
M5 vias anymore. However as you'll need to reduce placement density this will cost
you some unused area.
12. How to do a good floor plan and power stripes with blocks?

Ans: A good floorplan is made when,

Minimum space lost between macros/rows,

Macros placed in order to be close to their related logic,

IR/Electro Migration is good,

Routing congestion as minimal,

13. How to reduce congestion?


Ans: By adding placement blockage & routing blockage during the floorplan, Congestion
can be reduced. Placement blockage is to avoid the unnecessary cell placement in between
macros & other critical areas. Routing blockage is used to tell the global router not to
route anything on the particular area. Sometimes people used to change/modify the blockages
according to their needs

Normally routing blockages should be placed before global routing to force global router to
respect these blockages. Most Place and Route tools runs the first global routing at placement
step and then updates it incrementally, therefore add blockages before placement. Otherwise
if you want to use it after any global/detail routing is done, you may need to update global
routing first (may be incrementally).

14. How to find the reason for congestion in particular region? How to reduce
congestion?

Ans: First analyze placed congested database, and find out the hot spot which is highly
congested.

Case -1 "Congestion in Channel between macro"

Reason: - Not enough tracks is available in channels to route macro pins, or channel is highly
congested because of std cell placement.

Solution: - Need to increase channel width between Macros or please make sure that soft
blockage or hard blockage is properly placed.

Case -2 "Congestion in Macro Corners"

Reason: - Corners of macro is very prone to congestion because it’s having connectivity from
both direction

Solution: -

i. Place some HALO around each macro (5-7um).


ii. Place a hard blockage on macro corners (corner protection (Hard Placement
Blockage) done after standard cell rail creation otherwise it won't allow
standard cell inside it.
Case -3 "Congestion in center of chip/congestion in module anywhere in chip"

Reason: - Congestion in standard cell or module is based on the module local density (local
density is very high 95%-100%).Also depend on module nature (highly connected). Die area
less.

Solution: -

i. Module density should be even in whole chip (order os 65-85%).


ii. Use density screen/Partial blockage to control module density in specific areas.
iii. Use cell padding.
iv. If congestion is too big in that case chip area should be increased based on the
congestion map

15. What are the reasons for the Routing congestion in a design?

Ans: Routing congestion can be due to:

i. High standard cell density in small area.


ii. Placement of standard cells near macros.
iii. High pin density on one edge of block.
iv. Placing macros in the middle of floorplan.
v. Bad Floorplan.
vi. Placement of complex cells in a design.
vii. During IO optimization tool does buffering, so lot of cells sits at core area.

16. What actually happens in power planning? What is the main aim of power planning?

Ans: The main aim of power planning is to ensure all the cells in the design are able to get
sufficient power proper functioning of the design. During the power planning the power rings
and power straps are related to distribute power equally across the design.
Power straps are provided for the regulated power supply throughout the block or chip. Number
of straps depends on the voltage and the current of your design. You must design the power
grid that will provide equal power from all sides of the block .you can also use the early rail
analysis method determine the IR drop in your block and lay the sufficient power stripes.

17. How power stripes are useful in power planning?

Ans: If the chip size is large, therefore core rings do not able to supply to standard cells because
of long distance particularly the cells in the center of the chips (or will give high IR drop to the
farthest cells), then you need stripes. The number of stripes depend of the area of you chip.

18. What is the minimum space between two macros? How we can find minimum space
of macros?

Ans: The distance between macro = (no. of pins of macros*pitch*2)/no. of available routing
layers

For example, the design has 2 macros having the pins of 50 each macro and pitch = 0.50
and available metals are 8.

Then space between macros = ((50+50)*0.5*2)/8 = 12.5.

19. What are the steps needed to be taken care while doing Floorplaning?

Ans: Die Size Estimation.

 Pin/pad location.
 Hard macro placement.
 Placement and routing.
 Location and area of the soft macros and its pin locations.
 Number of pads and its location.

Note: - For block level Die size and Pin placement comes from TOP.

 Fly-line analysis is required before placing the macros.


 While fixing the location of the pin or pad always consider the surrounding
environment with which the block or chip is interacting. This avoids routing
congestion and also benefits in effective circuit timing.
 Provide sufficient number of power/ground pads on each side of the chip for
effective power distribution.
 In deciding the number of power/ground pads report and IR-drop in the design
should also be considered.
 Orientation of these macros forms an important part of floorplaning.
 Create standard cell placement blockage (Hard Blockage) at the corner of the macro
because this part is more sensitive to routing congestion.
 Using the proper aspect ratio (Width /Height) of the chip.

for placing block-level pins:

 First determine the correct layer for the pins.


 Spread out the pins to reduce congestion.
 Avoid placing pins in corners where routing access is limited.
 Use multiple pin layers for less congestion.
 Never place cells within the perimeter of hard macros.
 To keep from blocking access to signal pins, avoid placing cells under power straps
unless the straps are on metal layers higher than metal2.
 Use density constraints or placement-blockage arrays to reduce congestion.
 Avoid creating any blockage that increases congestion.
20. What is the difference between standard cells and IO cells? Is there any difference in
their operating voltages? If so why is it?

Ans:

 Standard Cells are logical cells. But the IO cells interact between Core and outside
world.
 IO cells contains some protection circuits like short circuit, over voltage.

There will be difference between Core operating Voltage and IO operating voltage. That
depends on technology library used. For 130 nm generic library the Core voltage is 1.2 v
and IO voltage is 2.5/3.3.
21. What is the significance of simultaneous switching output (SSO) file?
Ans: SSO: The abbreviation of “Simultaneously Switching Outputs”, which means that a
certain number of I/O buffers switching at the same time with the same direction (H ! L, HZ !
L or L ! H, LZ ! H). This simultaneous switching will cause noise on the power/ground lines
because of the large di/dt value and the parasitic inductance of the bonding wire on the I/O
power/ground cells.

SSN: The noise produced by simultaneously switching output buffers. It will change the
voltage levels of power/ground nodes and is so-called “Ground Bounce Effect”. This effect is
tested at the device output by keeping one stable output at low “0” or high “1”, while all other
outputs of the device switch simultaneously. The noise occurred at the stable output node is
called “Quiet Output Switching“ (QOS). If the input low voltage is defined as Vil, the QOS of
“Vil” is taken to be the maximum noise that the system can endure.

DI: The maximum copies of specific I/O cell switching from high to low simultaneously
without making the voltage on the quiet output “0” higher than “Vil” when single ground cell
is applied. We take the QOS of “Vil” as criterion in defining DI because “1” has more noise
margin than “0”. For example, in LVTTL specification, the margin of “Vih” (2.0V) to VD33
(3.3V) is 1.3V in typical corner, which is higher than the margin of “Vil” (0.8V) to ground
(0V). DF: “Drive Factor” is the amount of how the specific output buffer contributes to the
SSN on the power/ground rail. The DF value of an output buffer is proportional to dI/dt, the
derivative of the current on the output buffer. We can obtain DF as: DF = 1 / DI

22. Differentiate between a Hierarchical Design and flat design?


Ans:
 Hierarchical design has blocks, sub blocks in a hierarchy; Flattened design has no sub
blocks and it has only leaf cells.
 Hierarchical design takes more run time; Flattened design takes less run time.

23. What is TDF?


Ans: Top Design Format (TDF) files provide Astro with special instructions for planning,
placing, and routing the design. TDF files generally include pin and port information. Astro
particularly uses the I/O definitions from the TDF file in the starting phase of the design flow.
[1]. Corner cells are simply dummy cells which have ground and power layers. The TDF file
used for SAMM is given below. The SAMM IC has total 80 I/O pads out of which 4 are dummy
pads. Each side of the chip has 20 pads including 2 sets of power pads. Number of power pads
required for SAMM is calculated in power planning section. Design is pad limited (pad area is
more than cell area) and inline bonding (same I/O pad height) is used.

24. Is there any checklist to be received from the front end related to switching activity
of any nets to be taken care of at the floorplanning stage?
Ans: Yes. The Switching activities of Macros will be available in checklist, it contains the
power consumption of each macro at different frequencies are also available.

25. What is power trunk?


Ans: Power trunk is the piece of metal connects the IO pad and Core ring.

26. How to handle hotspot in a chip?


Ans: Increasing the number of power straps or increasing the width of power strap will help
us to reduce hot spot created by voltage drop and to maintain the voltage drop less than 10 %

27. What is power gating?


Ans: Power gating is one of power reduction technique. This helps by shutting down the
particular area of chip from utilizing power.

28. Whether macro power ring is mandatory or optional?


Ans: For hierarchical design the macro power ring is mandatory. For flat design the macro
power ring is optional.
29. If you have both IR drop and congestion how will you fix it?
Ans:
 Spread macros
 Spread standard cells
 Increase strap width
 Increase number of straps
 Use proper blockage

30. Is increasing power line width and providing more number of straps are the only
solution to IR drop?
Ans:
 Spread macros.
 Spread standard cells.
 Use proper blockage.
31. What is tie-high and tie-low cells and where it is used?
Ans: Tie-high and Tie-Low cells are used to connect the gate of the transistor to either power
or ground. In deep sub-micron processes, if the gate is connected to power/ground the transistor
might be turned on/off due to power or ground bounce. The suggestion from foundry is to use
tie cells for this purpose. These cells are part of standard-cell library. The cells which require
Vdd comes and connect to Tie high (so tie high is a power supply cell)...while the cells which
wants Vss connects itself to Tie-low.
32. How to insert Tap Cells?
Ans: Tap cells are a special nonlogic cell with well and substrate ties. These cells are typically
used when most or all of the standard cells in the library contain no substrate or well taps.
Generally, the design rules specify the maximum distance allowed between every transistor in
a standard cell and a well or the substrate ties.
You can insert tap cells in your design before or after placement:
 You can insert tap cell arrays before placement to ensure that the placement complies
with the maximum diffusion-to-tap limit.
 You can insert them after placement to fix maximum diffusion-to-tap violations.
Adding Tap Cell Arrays

Before global placement (during the floorplanning stage), you can add tap cells to the design
that form a two-dimensional array structure to ensure that all standard cells placed
subsequently will comply with the maximum diffusion-to-tap distance limit.
You need to specify the tap distance and offset, based on your specific design rule distance
limit. The command has no knowledge of the design rule distance limit. After you run the
command, it is recommended that you do a visual check to ensure that all standard cell
placeable areas are properly protected by tap cells.

 Every other row – Adds tap cells in every other row (in the odd rows only). This pattern
reduces the number of added tap cells by approximately half, compared to the normal
pattern.
The distance value should be approximately twice that of the distance value specified
in the design rule.
Fill boundary row/Fill macro blockage row – Fills the section of a row that is adjacent
to the chip boundary or the macro/blockage boundary to avoid tap rule violation (the
default). When deselected, the section of the row adjacent to the chip boundary or the
macro/blockage boundary might need to rely on taps outside the boundary to satisfy
the tap distance rule.
 Stagger every other row – Adds tap cells in every row. Tap cells on even rows are offset
by half the specified offset distance relative to the odd rows, producing a checkerboard-
like pattern. Make sure you enter the offset distance to be used.
The distance value should be approximately four times that of the distance value
specified in the design rule.
Boundary row double density/Macro blockage row double density – Doubles the tap
density on the section of a row that is adjacent to the chip boundary or the
macro/blockage boundary to avoid tap rule violation (the default). When deselected,
the section of the row adjacent to the chip boundary or the macro/blockage boundary
needs to rely on taps outside the boundary to satisfy the tap distance rule.
 Normal – Adds tap cells to every row, using the specified distance limit. The distance
value should be approximately twice that of the distance value specified in the.
 Control the tap cell placement using the following options:

 Ignore soft blockages - Ignores soft blockages during tap cell insertion. The default is
false.
 Ignore existing cells - Ignores any standard cells already placed. The default is false.
When this option is selected, tap cell placement may overlap existing standard cells.
 At distance tap insertion only - When selected, tap cells are inserted at distance d or at
d/2 only. The distance specified with the -distance option is d, and the default is false.
With this option, tap cells are placed uniformly but might cause DRC violations.
Tap distance-based
This method is typically used when the internal layout of a standard cell is not available.
The command uses a simple distance model where the specified distance from a standard
cell to a tap cannot be violated. Type the tap distance limit, or keep the default.
DRC spacing-based:
This method is used when the internal layout of a standard cell is available. The command
reads the well, diffusion, and contact layers of the standard cell layout and, by using the
intersection of the given layers, identifies the p- and n-transistor diffusion area and the
substrate and well contact locations. Also, a tap inside a standard cell or tap cell can be
used by the transistor diffusion area of another standard cell. This method makes the most
efficient use of the taps and results in fewer taps being inserted. Specify the maximum
distance design rule from a p- or n-diffusion to a substrate or well tap. Select the name of
the following layers, as needed: n-well layer, n-diffusion layer, p-well layer, n-diffusion
layer, and contact layer.
 Freeze standard cell – Does not move standard cells. In this method, a higher number
of tap cells might need to be inserted, and the resulting placement might not be free of
DRC violations.
 Allow moving standard cells – Moves standard cells to avoid overlapping with tap
cells. In this method, the timing can sometimes be affected.
Q. What is IR Drop Analysis? How it effects the timing?

Ans: The power supply in the chip is distributed uniformly through metal layers (Vdd and Vss)
across the design. These metal layers have finite amount of resistance. When voltage is applied to
this metal wires current start flowing through the metal layers and some voltage is dropped due to
that resistance of metal wires and current. This Drop is called as IR Drop. For example, a design
needs to operate at 2 volts and has a tolerance of 0.4 volts on either side, we need to ensure that
the voltage across its power pin (Vdd) and ground pin (Vss) in that design does not fall short of
1.6 Volts. The acceptable IR drop in this context is 0.4 volts. That means the design in this context
can allow up to 0.4 volts drop which does not affect the timing and functionality of design.

Q. How it effects the timing?

Ans: IR Drop is Signal Integrity (SI) effect caused by wire resistance and current drawn off from
Power (Vdd) and Ground (Vss) grids. According to Ohms law, V = IR. If wire resistance is too
high or the current passing through the metal layers is larger than the predicted, an unacceptable
Voltage drop may occur. Due to this an acceptable voltage drop, the power supply voltage
decreases. That means the required power across the design is not reaching to the cells. This results
in increased noise susceptibility and poor performance.
The design may have different types of gates with different voltage levels. As the voltage at gates
decreased due to unacceptable voltage drop in the supply voltage, the gate delays are increased
non-linearly. This may lead to setup time and hold time violations depending on which path these
gates are residing in the design. As technology node shrinking, there is decrease in the geometries
of the metal layers and the resistance of this wires increased which lead to decrease in power
supply voltage. During Clock Tree Synthesis, the buffers and inverters are added along the clock
path to balance the skew. The voltage drop on the buffers and inverters of clock path will cause
the delay in arrival of clock signal, resulting hold violation.
Q. What are the tools used for IR Drop Analysis? In which stage IR Drop Analysis
performed?

Ans: Various tools are available for IR Drop Analysis. Voltagestorm from Cadence, Redhawk
from Apache are mainly used to show IR Drop on chip. Here we are going to discuss about IR
Drop using Redhawk. IR Drop Analysis using Redhawk is possible at different stages of the design
flow. When changes are in expensive and they don't effect project's schedule, it is better to use
Redhawk for IR drop analysis from start of the design cycle. It can identify and fix power grid
problems in the design. This also reduces changes required in sign-off stage where final static and
dynamic voltage (IR) drops performed. So Redhawk can be used anywhere in the design starting
from the floorplanning stage through initial and final cell placement stages.

Q. What are the different types of IR Drop Analysis?

Ans: There are two types of IR Drop Analysis


i. Static IR Drop.
ii. Dynamic IR Drop Analysis.
Static IR drop Analysis is vectorless power analysis with average current cycles, whereas,
Dynamic IR drop analysis is vector based power analysis with worst-case switching currents.

Q. What are the different reasons for high voltage drop in a design?

Ans: In a design if there is high static or dynamic voltage drop, It could be due to one of the
following reasons.
I. High current flowing through the power grid : can affect Static as well as Dynamic IR drop
II. High PG grid impedance: can affect static as well as Dynamic IR Drop.
III. Simultaneous Switching: can affect only dynamic.
IV. Insufficient number of voltage sources: can affect Static as well as Dynamic Drop.
V. High Package parasitics: can affect Static as well as Dynamic Drop.
VI. Inadequate amount of Decaps available: Can affect only dynamic.
Q. How to find the high IR Drop analysis is due to high current flowing through the power
(PG) grid?

Ans: IR Drop is Signal Integrity (SI) effect caused by wire (metal) resistance and current drawn
off from Power (Vdd) and Ground (Vss) grids. Static or Dynamic IR Drop is proportional to the
current flowing through the power grid. High average current can cause for high Static IR Drop.
Similarly in dynamic analysis high transient (switching) current can lead to high Dynamic IR drop.
Average current is proportional to the average power of the design. High average power can static
and dynamic voltage drop results. Redhawk power summary report file
(adsRpt/power_summary.rpt) will give you the details of power consumption of the design. Power
summary report will give you the power consumption for each voltage domain, frequency domain
and for each cell type in the design.
Instance power file (adsRpt<design&>power.rpt) will contain instance specific power values.
Can also click on any instance in the GUI to get more details of power calculation for that instance.
In Redhawk GUI, you can see the sorted list of high power instances in the design using result;
List of Highest Power Instances for Static Simulation menu.

Figure 1: Redhwak Power Summary Report


You can use the Power Density map (PD) (in figure2) to get the power density distribution in the
design. Similarly, Instance power map (IPM) will show you the instance power distribution.
Similarly, clock power map (CPM) will show power distribution separately for clock related
instances in the design.
Figure 2: Redhwak Home Page GUI

Average power has both static and dynamic components. Static component is the leakage power.
You can look at the leakage power map (LPM) in the results panel (figure2) to see whether there
are any cells with excessive leakage.
From the instance power file (adsRpt/<design>power.rpt) you can find leakage power component
for any instance.
Dynamic component is contributed by internal power and the switching power. This component
is proportional to the frequency, load and toggle rate. Reason for high dynamic power could be
one of the following:
High frequency of switching.
High Load capacitance.
High toggle rate or BLOCK_POWER_FOR_SCALING used in the analysis.
From the instance power file, you can get the of the power calculation. You can analyze the
Instance Frequency Map (IFM) to see whether high frequency is causing high power in some
region.

Figure 3: Power Density Map

Load Cap map (LC) will tell you whether high load is causing the dynamic component of power.
High load issue normally happens when you have un-synthesized clock tree or scan chains, with
some buffers driving huge fan-out load
High toggle rate can also cause high dynamic power. Redhawk derives the toggle rate from one of
the following ways
User can scale the power values computed by scaling the TOGGLE_RATE using the
GSR keyword BLOCK_POWER_FOR_SCALING. Values specified in this section directly affect
your static and dynamic results.
You can also click on any metal / via segments to see the amount of current flowing through the
geometry. Static analysis shows the average current and dynamic analysis shows peak current. In
static analysis, it will also show you the current direction. Map shows the current distribution
throughout the chip. High transient current can be caused by simultaneous switching in the design.
Q. How to find the high IR Drop analysis is due to high PG impedance?
Ans: High Power grid resistance will impede the current flow in the power grid causing high static
or dynamic voltage drop. You can use PG Resistance Map (View -> Resistance Maps) to highlight
areas with high PG resistance. Also, you can write out the PG Resistance report using the Redhawk
command “perform gridcheck” More details on PG Weakness analysis can be found in the
application note “Analyzing PG Weakness Results in Redhawk GUI”.
Redhawk has several features to analyze the structural weakness issues in the power grid. You can
use “View -> Connectivity “ menu to analyze PG structural issues such as: Disconnected instances,
Disconnected wires/vias, shorts, missing vias.
When you highlight disconnected instances in Redhawk GUI, instances with VSS disconnect will
get highlighted in Blue, VDD disconnect will get highlighted in Green and both VDD/VSS
disconnect will get highlighted in Yellow. Corresponding text reports are also available inside
adsRpt directory (adsRpt/*.unconnect, adsRpt/apache.missingVias etc).
If there is any major disconnect in the power grid, it will affect the current flow in the design. You
can use the current map (CUR button) to review the current flow through the power grid to see
whether there are any surprises.

Figure 4. Connectivity Analysis


If you are performing RLC extraction on the power grid, high inductance can also cause high
dynamic drop. You can perform a dynamic analysis based on RC extraction and compare the
results to see whether L component is causing the high drop.
Low Power Design
Power is limiting factor affection performance and features in most important products. When you
decided to buy a mobile, what are the features you look for? The mobile should have camera
(primary or secondary), 3G/4G support, and all the features. Apart from these features, the mobile
should be light weight (portable), long battery life. For suppose you have to travel long distance,
you are carrying your mobile. If the battery of mobile lasts in few hours. Then you hate to charge
the battery again and again. To make battery lasts for long time, Low power design comes into the
picture. Power management issues are affecting every aspect of of the design. They can be
Architecture, Design technique, Process technology, Design methodology, Software

Challenges of Low Power:


Lowering Supply Voltage.
Increasing Device Densities as Technology Node Shrinking.
Increasing Clock Frequencies.
Lowering Transistor Threshold Voltage.

Components of Power:
Static Power.
Dynamic Power.
Total Power Consumption.
Total Power consumption = Static power consumption+ Dynamic Power Consumption

1. Dynamic Power:
The dynamic power is depends upon following.
During the switching of Transistors
Depends upon the clock frequency and switching activity
Consisting of switching and internal power
Dynamic power consumption is given by

So Dynamic power depends on the Load capacitance, Clock frequency and operating voltage.
Dynamic power can be reduced by lowering operating voltage (Vdd), lowering switching activity
and lowering switch capacitance (C load).
Load capacitance (C load) depends on
Output node capacitance of the logic gate (Due to the drain diffusion region.
Total interconnects and capacitance (Has higher effects as technology node shrinks.
Input node capacitance of the driven gate (Due to gate oxide capacitance).
Internal Power:
Power consumed by the cell when input changes, but the output doesn't change. Lower threshold
voltages and slower transitions result in more internal power consumption.
Short Circuit Power:
For finite rise and fall time, When Vtn < Vin < (Vdd-Vtp) holds, there will be a conductive path
open between Vdd and GND because both the nMOS and pMOS devices will be simultaneously
on.
Short-circuit power is typically estimated as:

This short circuit power component is usually not significant in logic design, but it appears in
transistors that are used to drive large capacitances, such as bus wires and especially off-chip
circuitry. As wires on chip became narrower, long wires became more resistive. CMOS gates at
the end of those resistive wires see slow input transitions.
To minimize the total average short circuit current, it is desirable to have equal input and output
edge times. In this case, the power consumed by the short circuit current is typically less than 10%
of the total dynamic power. An important point to note is that if the supply is lowered to below the
sum of the thresholds of the transistors,&nbsp;Vdd<Vthn+|Vthp|,the short circuit currents can be
eliminated because both devices will never be on at the same time for any input voltage value. By
balancing transistor size we can get equal Rise time and fall time.
2. Static Power
Transistor leakage current that flows whenever power is applied to the device
Independent of the clock frequency and switching activity
Static power is given by

Static Power can be reduced by lowering operating voltage and using fewer leaking transistors.
Leakage Power:
The power consumed by the sub threshold currents and by reverse biased diodes in a CMOS
transistor is considered as leakage power. The leakage power of a CMOS logic gate does not
depend on input transition or load capacitance and hence it remains constant for a logic cell.
There are different low power design techniques to reduce the above power components
Dynamic power component can be reduced by the following techniques
Clock gating.
Voltage and Frequency Scaling (DVFS, SVFS).
Gate Sizing.
Multi Vdd.
Static (Leakage) power component can be reduced by the following techniques
Multi Vt.
Power Gating.
Use new devices like Finfet and SOI.
Back (Substrate) Bias.

Clock Gating:
The dynamic power is given by.
Pdynamic = Af * Cload * Vdd2 = 0.5 Cload *Vdd2
Where Af: Switching Activity Factor.
Cload: Load Capacitance.
Vdd: Supply Voltage

Clock is high switching element in the design. It has high activity factor. Consequently, the clock
network ends up consuming a huge fraction of the dynamic power. Clock Gating reduces the
dynamic power by disconnecting the clock from an unused circuit block to limit switching activity
of clock. From the above equation it is clear that 50% of dynamic power is due to clock switching.
Clock Gating technique reduces the dynamic power consumed by limiting the switching activity
factor.
How Clock Gating Works?.
As shown in figure 1, the two circuits are implemented one without clock gating and another with
clock gating. In Figure 1(a), when the enable is high, the input D is propagated to as input to the
next synchronous element (flip-flop). The new data D is propagated as output Q during the clock
edge. When the enable is low the recycled data is propagated. In both cases (Enable is either high
or low) the clock is continuous to toggle (switching) at the flipflop, which dissipates dynamic
power.

Figure 1(a): Circuit without clock gating

As shown in figure 1(b), the clock to the flipflop is applied through AND gate. This clock is called
as gated clock. This technique is clock gating technique. When Enable is at low level and the clock
is at high level, the clock won't toggle (Switch) because of AND gate. In this way the clock Gating
technique will reduce the switching activity of clock in order to save the power.

Figure 1(b): Circuit with Clock gating


Multi Voltage Design:
Power is primary concern in many segments of today's electronics business. As discussed in earlier
posts, Power is two types in IC Design - Dynamic and Static power. Dynamic power comprises of
internal power and switching power whereas static power comprises of leakage power. As
discussed in earlier post, Internal power (Dynamic) includes short-circuit (Vdd to GND) power as
well as power consumed due to switching of internal nets. Switching (Dynamic) power is due to
charging and discharging of load capacitance during switching.

We know that, Dynamic power is proportional to C.V^2. f. where

The dynamic power in designs is growing rapidly because dramatic increases in clock speeds and
transistor counts. By using clock gating technique, the dynamic power due to switching can be
reduced. But dynamic power varies linearly with frequency and it varies proportional to square of
the operating voltage. Therefore, we can reduce the dynamic power significantly by reducing the
operating voltage.
Challenges and Requirements for Multi Voltage Design:

Figure 1: Multi Voltage Design Style


Multi-voltage design styles vary with the target application. Figure 1 shows three different design
styles used today. The most standard style consists of partitioning the design into independent
voltage areas (or islands) that can function at a specific minimum voltage under a given
performance constraint. Each voltage area operates at a single voltage: this can be the same as the
chip voltage main Vdd or it can be a different voltage. Another commonly used multi-voltage
design style consists of a power-down mode where one or more voltage areas may be shut down
to conserve power during low-performance operating modes, such as sleep or hibernation. The
most advanced multi-voltage design style, however, is Adaptive Voltage Scaling (AVS). AVS uses
on-chip (or off-chip) monitors to adaptively adjust voltage levels based on operating mode
requirements and process and temperature.

To achieve multi-voltage design, a systemic solution is required that:


 Supports advanced infrastructures, offering required libraries and cells for different multi-
voltage design
 Offers integrated RTL to GDSII implementation with advanced, convergent dynamic and
leakage power.
 Ensures timing, SI, power, and power integrity sign-off.

Power Gating:
Power Gating is a low power technique in deep sub-micron technologies. Power Gating is
performed by shutting down the power for a portion of the design in order to reduce the static
(leakage) power in the design. Power Switch (PS) cell is basic element which is used in power
gating technique to shutting down the power for a portion of the design. The PS cell is also known
as power management cell. The basic idea of power gating is to separate the VDD or GND power
supply from standard cells of a specific design hierarchy.
Appropriate sized PMOS (Header) or NMOS (Footer) transistors are used as Power Switch (PS)
cells. These two NMOS, PMOS cells only differ in the fact that the switches switch different power
rails VDD and VSS respectively as shown in below Figure1. The designer turned to use header
switches since header switches have less leakage and they are also easier for implementation.
Switch cell has two modes of operation.
When switches are in off state, they disconnect the devices inside the block from power source.
This reduces the leakage current flow in the devices of the block.
There are two approaches in Power Gating.
1. Fine Grain Power Gating.
2. Coarse Grain Power Gating.
Figure 1: Power Gating

In Fine Grain Power Gating Technique, Each standard cell has inbuilt power switch. Where in
Coarse Grain technique switches control entire block of standard cells using a large size transistor.
Each of these approaches has their various trade-offs. Fine grain is easier to implement in terms of
timing analysis, but with significant area overhead resulting in higher fabrication cost. On the other
hand, the coarse grain switches require more consideration in terms of timing and wake-up time,
but shows grater leakage saving. The coarse grain power gating is common implementation
technique nowadays and can reduce leakage current by 30X.

Power Switches Placement Styles:


Coarse grain implementation provides multiple placement topologies for the power switches. For
example, switches can be placed around the power domain (in a column or ring way) or in an array
fashion inside the domain area. Array style is a more common technique as it yields smaller IR-
drop and less area. It is also more efficient with respect to Power-Gates control sequence. On the
other hand, ring approach can eliminate the user from synthesizing complicated Power-Grid and
it also gives better placement results, as it removes fragmentations from placement areas.
Array style also suits best Flip-Chip designs, where Power is delivered from the Bond pads placed
also inside the core, which reduce IR-drop significantly, when compared to ring placement style.
Low Power Cells:
To facilitate data transfer between multiple Power domains operating at different voltage levels, it
is recommended to use level-shifters. Usually both low-to-high and high-to-low level shifters are
provided by library vendors. Level shifters are used for two main reasons. First of all, when a
signal propagates from a low-voltage block to a high-voltage block, a lower voltage at the PMOS
gate might result in the gate not being entirely switched off, which can cause abnormal leakage
current. Secondly, because signals must transition across voltage domains, levels shifters should
be used to ensure that both net transition and net delays are accurately calculated.

For power domains which share the same operating voltage but some of them may be shut-off, an
isolation cell is required on power domain interface. The reason for this is that cells connected to
power-off blocks, their inputs become floating which may cause high leakage power. Therefore,
isolation cells are necessary to isolate floating inputs. The isolation is performed by setting a
default logic value on the output depends on the state of a dedicated control pin. Usually 2 types
of isolation cells are provided by the library vendor: clamp0 and clamp1, which differs by the
default value, set in isolation state. Desired cell type is chosen according to the functionality on
the receiver side.

Blocks operate at different voltage levels, and some of them can also be turned off, requires both
isolation and level-shifting functions at the power domain interface. To simplify implementation,
library vendors usually supply a single cell called the enable-level shifter, which is basically a
level-shifter that includes an enable signal.

The recommendation is to place Enable Level Shifters on all outputs of such blocks. Both Isolation
cells and Enable Level Shifters are placed on the Always-on area. Figure 2 illustrates Low-Power
cells usage between various types of power domains.
Figure 2: Low Power Cells Usage
Power Switch Count:
In order to ensure correct operation under functional mode, we need to make sure no I/R drop is
within cell characterization range (usually 10% of Nominal voltage). Since Power switches are in
linear state when they are turned ON, they act like a resistor which drops the Voltage based on its
resistance, as described in figure 3.

Figure 3: IR Drop through Power Switch.


Minimal number of power switches can be determined from the following data:
DC I/V curve (Transistors are in linear state.
IR drop limit for the switches.
Domain power consumption.
One can use the following formula to derive the minimum number of switches required for a design
when the above data is given as input.

Additional optimization can be made for leakage/Performance trade-off. While large number of
switches increases total leakage &amp; area, insufficient number of switches increase IR drop an
degrades performance.

Different cells used for Low Power Design.


Level Shifters, Isolation Cells, Retention Registers, Power Switches, Always on Cells
Level Shifter:
Level Shifters are used in multi voltage design in which more than one voltage supply used.
Consider In your design two voltage domains are there. One voltage domain V1 has 1.2V power
supply another domain V2 has 1V power supply. Signal has to cross from one domain to another
domain while in functional mode. Now assume signal crossing from Low voltage domain V2 to
High voltage domain V1, its logic is interpreted wrongly at V1. To prevent this level shifters are
inserted between the different voltage domains for the signals which cross from low voltage
domain to High voltage domain and from High voltage domain to low voltage domains.

The main functionality of the level shifters is to shift the voltage one voltage to another voltage
level depending upon the the signal crossing different voltage domains.

Isolation Cells:
Isolation cells are used between the domains. Consider there are two domains are in your design
i.e. D1 and D2. The domain D1 is power shut down mode and other domain D2 is in active mode.
Since Domain D1 is power down mode it can propagate invalid logic to domain D2. To prevent
this, Isolation cells are inserted between the domains to clamp a known value at its output, while
domain D1 is shut down mode. Isolation cells should be placed in always on domain to serve its
functionality (clamp the known value to the other domain)

Power Switches:
Power switches are used in power gating technique. As we already discussed in previous post,
Power gating is used to reduce the static (Leakage) power in the design. Power gating is performed
by shutting down the power for portion of design. Power switches are used to turn off the portions
of design which are inactive at a point of time to reduce leakage power.

Retention Registers:
Retention Registers are used to store register states before power down mode. These values will
be restored when power is up. So retention cells should be always on to serve the purpose. As these
are always on, it can consume power even power down mode.

Always On cells:
These cells are special cells which should be always on to their purpose.
Placement
Placement is the process of placing standard cells in the rows created at floorplanning stage. The
goal is to minimize the total area and interconnects cost. The quality of routing is highly
determined by the placement.

Inputs for Placement stage:


 Gate level netlist,
 Floor planned design,
 Design libraries,
 Design constrains,
 Technology file.

Gate Level Netlist:


Gate-level netlist contain references to standard cells and macros, which are stored in the logical
libraries, as well as other hierarchical logic blocks. Before placing one must ensure that all
references can be resolved.
Reference Libraries:
Reference Libraries contain logical and physical information of macros, standard cells used by
many other designs. These are referenced by pointers in the design library for memory efficiency.
A standard cell library also contains a corresponding abstract view for each layout view.

Needs of Placement
Placement is a critical step in VLSI design flow mainly for the following four reasons
1. Placement is a key factor in determining the performance of a circuit. Placement largely
determines the length and hence, the delay of interconnects wires. Interconnects delay can
consumes as much as 75% of clock cycle in advance design. Therefore, a good placement
solution can substantially improve the performance of a circuit.
2. Placement determines the routing ability of a design. A well-constructed placement
solution will have less routing demand(i.e., shorter total wire length) and will distributes
the routing demand more evenly to avoid routing hotspots.
3. Placement decides the distribution of heat on a die surface. An uneven temperature profile
can lead to reliability and timing problems.
4. Power consumption is also affected by placement. A good placement solution can be
reduce the capacitive load because of the wires (by having shorter wire and larger
separation between adjacent wires). Hence the switching power consumption can be
reduced.
Placement is the process of finding a suitable physical location for each cell in the design.
Placement is performed in two stages: coarse placement and legalization

Coarse Placement:
The placement tool determines an approximate location for each cell according to the timing and
congestion constraints. The placed cells do not fall on the placement grid and may overlap each
other. Large cells, such as RAM and IP blocks, act as placement blockages for smaller, leaf-level
cells. Coarse placement is fast and is sufficiently accurate for initial timing and congestion
analysis.
Legalization:
During legalization, Placement moves the cells to precisely legal locations on the placement grid
and eliminates any overlap between cells. The small changes to cell locations cause the lengths of
the wire connections to change, possibly causing new timing violations. Such violations can often
be fixed by incremental optimization, for example, by re-sizing the driving cells.

The place_opt command is recommended for performing placement in most situations. This
command performs coarse placement high-fanout net synthesis, physical optimization, and
legalization all in a single operation. In certain applications, you might want to perform placement
tasks individually using commands such as create_placementand physopt, for a greater degree of
control or to closely monitor the results as they are generated.

In the placement process, placement tool considers possible trade-offs between timing and
congestion. Timing considerations bring cells closer together to minimize wire lengths and
therefore wire delays. On the other hand, the occurrence of congestion draws cells further apart to
provide room for the connections. Congestion cannot be ignored entirely in favor of timing because
rerouting wires around congested areas will cause an increase in wire lengths and wire delays, thus
defeating the value of close placement.

In the place_opt command, the –congestion option causes the tool to apply more effort to
congestion removal, resulting in better routability. However, this option should be used only if
congestion is expected to be a problem because it requires more runtime and causes area utilization
to be less uniform across the available placement area. If congestion is found to be a problem after
placement and optimization, it can be improved incrementally with the refine_placement
command. Timing, area, and congestion optimization can also be done incrementally with the
psynopt command.

-area_recovery option of the place_opt command allows placement tool to recover chip area where
there is extra timing slack available. For example, it can resize cells smaller in timing paths where
there is a positive timing slack. Placement is typically done before clock tree synthesis, so the
clock network is ideal and does not have a clock buffer tree available for accurate clock network
timing analysis. To get more accurate timing results, you should use the same commands as those
used in synthesis tool to specify non-zero latency, uncertainty, and transition times for the clock
network.

Placement:

Figure 1: Placement Flow

Before the start of placement optimization all Wire Load Models (WLM) are removed.
Placement uses RC values from Virtual Route (VR) to calculate timing. VR is the shortest
Manhattan distance between two pins. VR RCs are more accurate than WLM RCs.
Placement is performed in four optimization phases:
1. Pre-placement optimization
2. In placement optimization
3. Post Placement Optimization (PPO) before clock tree synthesis (CTS)
4. PPO after CTS.
Pre-placement Optimization optimizes the netlist before placement, HFNs are collapsed. It can
also downsize the cells.
In-placement optimization re-optimizes the logic based on VR. This can perform cell sizing, cell
moving, cell bypassing, net splitting, gate duplication, buffer insertion, area recovery.
Optimization performs iteration of setup fixing, incremental timing and congestion driven
placement.
Post placement optimization before CTS performs netlist optimization with ideal clocks. It can
fix setup, hold, max trans/cap violations. It can do placement optimization based on global routing.
It re does HFN synthesis.
Post placement optimization after CTS optimizes timing with propagated clock. It tries to
preserve clock skew.

2. What is Scan chain reordering? How will it impact Physical Design?


Grouping together cells that belong to a same region of the chip to allow scan connections only
between cells of a same region is called scan Clustering. Clustering also allows the degree of
congestion and timing violations to be eliminated.
Types of scan cell ordering
 Cluster based scan cell order -power - driven scan cell order.
 Power optimized routing constrained scan cell order.
Power driven scan cell order
 Determining the chaining of the scan cells so as to minimize the toggling rate in the scan
chain during shifting operations.
 Identifying the input and output of the scan cells of the scan chain to limit the propagation
of transitions during the scan operations.
If scan chain wire length is reduced, it will increase the wireability or reduces the chip die area
while at the same time increasing signal speed by reducing capacitive loading effects that share
register pins with the scan chains.
After scan synthesis, connecting all the scan cells together may cause routing congestion during
PAR. This cause area overhead and timing closure issues.
Scan chain optimization- task of finding a new order for connecting the scan elements such that
the wire length of the scan chain is minimized
Other Answers
Answer 1:
Based on timing and congestion the tool optimally places standard cells. While doing so, if scan
chains are detached, it can break the chain ordering (which is done by a scan insertion tool like
DFT compiler from Synopsys) and can reorder to optimize it.... it maintains the number of flops
in a chain.
Answer2:
During placement, the optimization may make the scan chain difficult to route due to congestion.
Hence the tool will re-order the chain to reduce congestion.
This sometimes increases hold time problems in the chain. To overcome these buffers may have
to be inserted into the scan path. It may not be able to maintain the scan chain length exactly. It
cannot swap cell from different clock domains.
Clock Tree Synthesis
What do you mean by CTS????
Clock Tree Synthesis (CTS) is the process of inserting buffers/inverters along the clock paths of
the ASIC design to balance the clock delay to all clock inputs. So in order to balance the skew and
minimize insertion delay CTS is performed.

Checklist before CTS


 Placement-completed
 Power ground (PG) nets- Prerouted
 Estimated congestion- acceptable
 Estimated Max Tran/cap – No violation
 High Fanout Nets
Checklist after CTS
 Skew report
 Clock tree report
 Timing reports for setup and hold
 Power and Area report
Inputs required for CTS
 Detailed Placement Database
 Target for latency and skew if specified
 Buffers or inverters for building the clock tree
 Clock tree DRC(max. tran., max. cap., max. fanout, max. no. of buffer levels)
Output of CTS
 Database with properly build clock tree in design.
CTS Goal
 Minimizing Clock Skew
 Minimizing Insertion delay
 Minimizing Power dissipation.
 Clock Tree Synthesis is a process which makes sure that the clock gets distributed evenly
to all sequential elements in a design.
 The goal of CTS is to minimize the skew and latency.
 The placement data will be given as input for CTS, along with the clock tree constraints.
 The clock tree constraints will be Latency, Skew, Maximum transition, Maximum
capacitance, Maximum fan-out, list of buffers and inverters etc.
 The clock tree synthesis contains clock tree building and clock tree balancing.
 Clock tree can be built by clock tree inverters so as to maintain the exact transition (duty
cycle) and clock tree balancing is done by clock tree buffers (CTB) to meet the skew and
latency requirements.
 Less clock tree inverters and buffers should be used to meet the area and power constraints.
 There can be several structure for clock tree:
 H-Tree
 X-Tree
 Multi-level clock tree
 Fish bone
 Once the CTS is done than we have to again check the timing.
 The outputs of clock tree synthesis are Design Exchange Format (DEF), Standard Parasitic
Exchange Format (SPEF), and Netlist etc.

NOTES:
 The normal inverters and buffers are not used for building and balancing because, the
clock buffers provides a better slew and better drive capability when compared to
normal buffers and clock inverters provides a better balance with rise and fall times
and hence maintaining the 50% duty cycle.
 Effects of CTS: Many clock buffers are added, congestion may increase, crosstalk
noise, crosstalk delay etc.
 Clock tree optimizations: It is achieved by buffer sizing, gate sizing, HFN synthesis,
Buffer relocation.

Set Up Fixing:
 Upsizing the cells (increase the drive strength) in data path.
 Pull the launch clock
 Push the capture clock
 We can reduce the buffers from datapath .
 We can replace buffers with two inverters placing farther apart so that delay can adjust.
 We can also reduce some larger than normal capacitance on a cell output pin.
 We can upsize the cells to decrease the delay through the cell.
 LVT cells
Hold Fixing:
It is well understood hold time will be large if data path has more delay. So we have to add
more delays in data path.
 Downsizing the cells (decrease the drive strength) in data path.
 Pulling the capture clock.
 Pushed the launch clock.
 By adding buffers/Inverter pairs/delay cells to the data path.
 Decreasing the size of certain cells in the data path, it is better to reduce the cells n
capture path closer to the capture flip flop because there is less chance of affecting other
paths and causing new errors.
 By increasing the wire load model, we can also fix the hold violation.
Transition violation
In some cases, signal takes too long transiting from one logic level to another, than a transition
violation is caused. The Trans violation can be because of node resistance and capacitance.
 By upsizing the driver cell.
 Decreasing the net length by moving cells nearer (or) reducing long routed net.
 By adding Buffers.
 By increase the width of the route at the violation instance pin. This will decrease the
resistance of the route and fix the transition violation.

Cap violation:
The capacitance on a node is a combination of the fan-out of the output pin and capacitance of
the net. This check ensures that the device does not drive more capacitance than the device is
characterized for.
 The violation can be removed by increasing the drive strength of the cell.
 By buffering the some of the fan-out paths to reduce the capacitance seen by the output
pin.

CLOCK TREE BEGIN AND END


Clock tree begins at SDC (Synopsys Design Constraints) defined clock source
Clock tree ends at clock pins of FF or hard macros or input pins of combinational logic also.

Terminologies and Definitions in CTS


 Stop (Sync) Pins: All clock pins of FF are called as Stop (Sync) Pins. The clock signal
should not propagate after reaching the syc/stop pin. This pin needs to be considered for
building the clock tree.
 Exclude (Ignore) pins: All non-clock pins such as D pin of FF or combo logic’s inputs
are called as Exclude (Ignore) pins. These pins are need not to be considered during the
clock tree propagation.
 Float (Implicit stop or macro model) pins: This is same as sync or stop pin but internal
clock latency of that pin is taken into consideration while building the clock tree. Its a clock
entry pin of hard macros and it needs to be considered while building the clock tree. But
before considering this as sync pin, the macro’s internal tree needs to be balanced.
 Explicit sync (stop) pins: Input of a combo logic is considered while building the clock
tree. Mostly this comes to picture whenever clock gating concept is used.
 Explicit Exclude (Ignore) pins: Clock pin of a flip flop is not considered as sync/stop pin.
This is also again due to the use of clock gating concept. Because while gating the clock,
the clock signal will be given to and gate.

Figure 1: CTS Terminologies


For synchronized designs, data transfer between functional elements are synchronized by clock
signals. In a top level digital design, you will have one more clock sources, like PLLs or
oscillators within the chip. You may also have an external clock source connection through an
IO. For a digital only block, you will have a clock pin that will be the clock source for the
block in question. Clock balancing is important for meeting the design constraints and clock
tree synthesis is done after placement to achieve the performance goals.

After placement you have positions of all the cells, including macros and standard cells.
However, you still have an ideal clock. (For simplicity, we will assume that we are dealing
with a single clock for the whole design). At this stage, buffer insertion and gate sizing and
any other optimization technique is employed on the data paths, but no change is done to the
clock net.

The same clock net connects all the synchronous elements in the design, irrespective of the
number.

This is how your design’s clock network is at this point.

Figure 2: Clock net before CTS


This is definitely not something we want. Think just about the load of one clock net. No driver
can drive that many flops! But when it is a synchronizing signal like clock, load or fanout is
not the only thing we are worried about. We also want a “balanced” tree that is the skew value
for the clock tree should be zero. After clock tree synthesis, the clock net will be buffered as
below.
Figure 3: Clock Net After CTS.

The main concerns in CTS are:

1. Skew – One of the major goals of CTS is to reduce clock skew.


Let us see some definitions before we go into clock skew.
 Clock Source
Clock sources may be external or internal to your chip/block. But for CTS, what we
are concerned about is the point from where the clock propagation starts for the digital
circuitry. The can be an IO port, outputs or PLL, Oscillators, or even the outputs of a
gate down the line. (E.g. a mux output).A clock source for CTS may also be specified
using ‘create_generated_clock’ command. This defines an internally generated clock
for which you want to build a separate tree, with its own skew, timing and inter-clock
relations.

You specify the clock source(s), using the command create_clock.


1 create_clock -name XTALCLK -period 100 -waveform { 0 50 } [get_pins {xtal_inst/OUT}]
2 create_clock -name clk -period 100 -waveform { 0 50 } [get_ports {clk}]
3
4 create_generated_clock -name div_clk1 \
5 -source [get_pins {block1/clk_out}] -divide_by 2 \
6 -master_clock [get_clocks {clk}]
 Clock Sinks
Sinks or clock stop points are nodes which receive the clock. Default sinks are the clock
pins of your synchronous elements like Flipflops.

Now let us define skew as the maximum difference among the delays from the clock source to
clock sinks.

In the picture above, the delay to clock sinks are given. The skew in this case is the difference
between the maximum delay and minimum delay.
Skew = 20ns-5ns = 15ns

The goal of clock tree synthesis is to get the skew in the design to be close to zero. i.e. every clock
sink should get the clock at the same time.

2. Power
– Clock is a major power consumer in your design. Clock power consumption depends on
switching activity and wire length. Switching activity is high, since clock toggles constantly. Clock
gating is a common technique for reducing clock power by shutting off the clock to unused sinks.
Clock gating per se is not done in layout; it should be incorporated in the design. However,lock
tree synthesis tools can recognise the clock gates, and also do a power aware CTS.

In the picture above, FF1 gets the ungated clock CLK, and FF2 and any subsequent flop gets a
gated clock. This clock is turned on only when the signal EN is present. (See ICG cells)

Make sure that you specify the clock as propagated at CTS stage. I.e. instead of ideal delay for
clock, you are now calculating the actual delay value for the clock. This will in turn give you a
more realistic report of the timing of the design. You can propagate the clock using the command
set_propgated_clock [all_clocks]

You might also like