1.signoff Semi Blog
1.signoff Semi Blog
A System on Chip (SoC) is an integrated circuit that integrates all components of an electronic
systems. It may contain digital, analog, mixed-signal, and radio-frequency modules—all on a
single substrate. SoCs are very common in the mobile computing market because of their low
power-consumption
SoC designs usually consume less power and have a lower cost and higher reliability than the
multi-chip systems that they replace. And with fewer packages in the system, assembly costs
are reduced as well.
Advantages of SoC
1. Compact system size (Chip size is very less compared to board size)
2. Less power consumption (Less components, less IOs, less passive components helps to
reduce power)
3. High performance
4. Less system cost
PCB – SoC
SiP (System in Package):
Advantages
1. Developing cost will be less
2. Faster turn around time (Development time will be less)
3. Different technology chips can be mounted in same package
4. Yield will be increased, as individual chip size are small
By Reza Mirhosseini – originally uploaded to en.wikipedia (file log), Public Domain, Link
Why PMOS is slower than NMOS?
Most of the answers would stop at “Mobility of hole is lesser than mobility of electrons in
semiconductor”. Obviously next question would be “Why?”
To understand this concept, let’s go back to basics of energy band diagrams.
Conduction electrons (free-electrons) travel in the conduction band and valence electrons
(holes) travel in the valence band. In an applied electric field, valence electrons cannot move
as freely as the free electrons because their movement is restricted. The mobility of a particle
in a semiconductor is larger if its effective mass is smaller and the time between scattering
events is larger.
Hence mobility of electrons is more than of holes. “Now we understand why PMOS are slower
than NMOS”
In short, ‘Hole’s movement in valence band requires more energy (voltage) compared to
free electron in conduction band’
Because of this mobility mis-match, beta ratio comes into picture while design standard logic
cells. And we also require special cells for clock network, whose rise & fall time are very well
balanced. Why clock cells need this balance? We will discuss on this in CTS training session.
PROCESS OVERVIEW
Most of the semicon giants do lot of research in process technology. Increasing need of ultra
low power, high performance, small form factor & low cost ICs is making Fabs/Companies to
put more money on the research. Traditional planar bulk process has many issues with further
scaling (like leakage & short channel effects). As of now there are two most popular solutions
for this problem. One is 3D finfet technology & FD-SOI. We will briefly discuss about these
three process technologies.
Different types of process technologies;
1. Bulk CMOS
2. FD-SOI (Fully Depleted Silicon on Insulator)
3. FinFET
BULK CMOS
It is a planar process. Cross sectional view of devices in Bulk CMOS are shown below:
By Reza Mirhosseini – originally uploaded to en.wikipedia (file log), Public Domain, Link
One of the mail limitations of bulk planar transistors is channel area underneath the gate is deep
and much of the channel is far away from the gate to be well-controlled. The result is higher
leakage power (static/stand-by power) and gate is never truly turned off.
FD-SOI (UTBB-FD-SOI : Ultra thin body & buried oxide, Fully Depleted Silicon On
Insulator )
Key advantages of Fully depleted SOI:
1. Better electro-static control of the channel.
2. No doping required
3. Limited short channel effects, compared to CMOS Bulk
4. Minimum junction capacitance and diode leakage
5. Low leakage power
FinFET
Scaling became very difficult in planar Bulk CMOS (less than 28 nm). Below video explains
the structure of FiFET deivce. This form of gate structure provides improved electrical control
over the channel conduction and it helps reduce leakage current levels and overcomes some
other short-channel effects.
Key advantages of FiFET:
1. Low power, hence allowing high integration levels.
2. FinFETs operate at a lower voltage as a result of their lower threshold voltage.
3. Because of reduced short-channel effects, further device shrinking id possible.
4. Very less leakage current
5. Improved operating speed
Process modelling
Process modelling is predict geometries and material properties of the wafer structures and
semiconductor devices as they result from the manufacturing process. We will be discussing
the use of process models (spice models) for timing of the entire IC. Spice models of the
process, are the source for any further timing model of the devices/cells/Memories/IPs/Sub
Systems & Interconnect models are also used during the simulations (which may generate
timing models)
Temperature Inversion
Inter Connect Variations & Parasitic Corners
Let us take the β ratio as 1.5. Hence, Wp=1.5Wn. Below given are the variables used
for calculating the standard cell height :
p = Poly overhang, here it is 2 units.
x = Minimum well to well spacing required between the two cells, here it is 12
units.
y = We need to leave half of the space between corresponding layer to avoid
half DRC violation between two different cells abutted on VDD and VSS. This
comes to 1.5 units.
Wp = Width of PMOS.
Wn = Width of NMOS.
Height of the standard cell , Wp+Wn+x+2y+2p = 88 units.
Using this formula, Wn is calculated as 27.6 units and Wp is calculated as 41.4 units.
Similarly we can calculate Wn and Wp values for different libraries.
If we compare 7T and 11T, 11T is faster and will give better performance because the
area for 11T is more so that we can place higher drive strength transistors in it.
Fig5: Jumper
De cap cells (Decoupling capacitor cells)
De cap cells are capacitors added in design between power and ground rails.
When there is drop in power rail, these cells act like a battery and maintain the voltage
across rails.
These cells aids IR drop issue and removes glitches in power.
In a design most of the power consumption is done by clock circuits. Assume that all
the clock blocks are clustered in an area, then they will consume more power, i.e. they
drew more current which will increase IR drop. In this case de cap cells can be used.
End cap cell
End cap cells are added near the end of rows to terminate the rows properly.
The n-wells of end cap cells are properly terminated within the cell.
Tie cell
Tie cells are used to avoid direct gate connection to the power or ground network
thereby protecting the cell from damage.
In your design, some cell inputs may require a logic 0 or logic 1 value. Instead of
connecting these to the VDD/VSS rails/rings, you connect them to special cells
available in your library called TIE cells.
In tie high cell, nmos acts as diode connected and gives logic 0 to the gate of pmos, so
we will get logic 1 as output whereas in tie low cell, pmos act as diode connected and
gives logic 1 to the gate of nmos, so we will get logic 0 as output.
a. Sub-threshold current
b. Gate oxide leakage
c. Diode reverse bias current
d. Gate induced leakage
It’s hard to find the accurate amount of leakage currents but it mainly depends on supply
voltage (VDD), threshold voltage (Vth), transistor size (W/L) and the doping concentration.
2. Dynamic power dissipation:
There are two reasons of dynamic power dissipation; Switching of the device and Short circuit
path from supply (VDD) to ground (VSS). This occurs during operation of the device.
a) Short-circuit power dissipation:
Because of slower input transition, there will be certain duration of time “t”, for which both the
devices (PMOS and NMOS) are turned ON ( Vtn to VDD-Vtp ). Now, there is a short circuit
path from VDD to VSS. This short circuit power is given by:
Pshort-circuit = Vdd. Isc. t
where, Vdd – Supply voltage, – Short-circuit current,
t – Short-circuit time
b) Switching power dissipation:
This is the power dissipated during charging and discharging of total load [output capacitance
+ net capacitance + input capacitance of driven cell(s)]. The switching power is given by:
Pswitch = α·VDD2 · Cload·f
where, α – Switching activity factor, f – Operating frequency,
VDD – Supply voltage & Cload – Load capacitance
Common power reduction methods are:
Reduce VDD, Cload, f, α
Multi voltage design.
Multi Vth cells (LVT, RVT, HVT cells etc).
Cells with different drive strengths.
Dynamic Voltage & Frequency Scaling (DVFS).
Clock gating (switching power reduction).
Multi-track cells can be used in a design.
Multi-bit flipflops can be used.
The power management techniques will start from the design specification stage, and are
employed at each and every step of physical design flow. The below chart shows overview of
power consumption at each stage.
A design has sub-systems with various functionalities. While operating the system, the sub-
functional blocks that are not necessary to function at a particular duration of time can be turned
OFF. Similarly blocks that do not require high speed of operation can be slowed down by
reducing the supply voltage. Some time, the sub-system’s functional performance requirement
varies from time to time (DVFS). All these power reduction methods add complexity to the
design.
UPF provides a universal low power design specification, usually written in Tcl language. The
technique primarily focuses on dynamic power consumption (which is dominant at 90nm).
Here comes the requirement of multi voltage designs (which requires level shifers between
different voltage domains)
As technology shrinks below 90nm, static power consumption has also become prominent.
Here comes the requirement of power gating (which requires isolation cells to isolate a
switching domain from an always on domain.)
To control all these, a power management unit is used, which triggers control signals of low
power cells as per requirement.
The logical intent of the design is completely provided with the help of RTL code but its
complicated to provide power information. Hence the power intent of the design is specified in
UPF. Power management file will be built at the architecture level of design stage. This forms
a complete description of the design. Various methods used for the power management are:
Clock gating method (ICG) [logic intent of the design]
Multiple height cells
Multi-voltage design (MVD)
Power shut-off (PSO) or Power Gating
Multi-Vth design (MV)
Dyamic voltage and frequency scaling (DVFS)
1. Clock gating method:
It is logical intent of the design which is provided in RTL code. Suppose there are a group of
flops meeting “min_bit_width”, having same load enable (data to these flops are constant),
clock switching can be disabled during that time, thereby saving dynamic power to a great
extent. Clock is made available only when the data changes. Clock gating is implemented using
an ICG cell. Read more on clock gating in our synthesis blog.
2. Multi-Voltage design:
As per the equation P = α C V2 f , as supply voltage is scaled down, power reduces to a great
extent. Hence sub-systems that do not require higher speed of operation, can be operated at
lower voltages, saving dynamic power. The design can have multiple voltages as per the
performance requirement.
Sub-systems that operate at different voltages have separate power domains, each having
separate supply ports and nets. This technique requires level shifter when a signal is passed
from one domain to another, based on requirement. There are two types of level shifters:
Low to high
High to low
Whenever signal from low domain goes to high domain as input, there will not be full output
swing available at the output of high domain. This is because signal from low domain changes
the region of operations of devices in high domain. So, Low-to-high level shifter is used.
Whenever signal passes from high to low domain, if the destination cell cannot withstand high
voltage, then a H-L level shifter is inserted in that path. The level shifter can be in placed in
source/ destination power domain or in default domain and it will take both the voltages (source
domain voltage and destination domain voltage) for its operation.
3. Power Gating :
Whenever operation of sub-blocks are not required, there is a scope to shutdown voltage
domains. This technique uses power switches to disable power. The power switches are
MTCMOS. During normal operation, LVT is used (to reduce short circuit power) and during
off mode, HVT is used (to reduce leakage power). Power switches are controlled by the power
management unit.
If the load is more, huge amount of in-rush current flows, to charge the internal capacitors. To
reduce this, the power switches are enabled in a daisy chain fashion.
Isolation Cell :
When a source domain (PD1) is in off-mode then its output pin has to be isolated from
destination domain (PD2) to prevent invalid logic being propagated to PD2. Along with
isolation it will save the short circuit power dissipation at the reciever cell.
There are 2 types of isolation cell as per logic requirement:
“Clamp to 0” cell (AND gate)
“Clamp to 1” cell (OR gate)
Retention Flop :
Whenever a gated domain is turned off, the state of the flop needs to be retained with less
leakage power. When gated domain is powered back on, the stored data can be used, rather
than initializing again.
This is achieved by using data retention flops. Retention flops contains a DFF and latch. It
requires low power always-on supply to retain the data.
This feature comes with the cost of Area of the device which is more compared to normal flop
and An aditional power supply has to be provided which is low-voltage always ON.
Always ON cell :
Always on cells are special cells which are always turned-on irrespective of their placement in
switching domain. They are used to drive the net which is passing from always on domain.
Generally Always-ON buffers and inverters are used. We need to define the always on cells in
the UPF file.
4. Multi-Vth design (MV)
In a design, standard cells are provided with different flavors based on the threshold voltage.
Variation in threshold voltage will affect the power consumption and timing hence these are
used to optimize the power and timing issues. These cells are usually named as:
HVT cells
RVT cells
LVT cells
This table shows the characteristics of Multi-Vth cells. The area of all the flavours of a cell is
always same. Only threshold voltage varies and hence power and delay.
The design is synthesized with RVT and HVT cells but while optimizing LVT cells are used
to meet the critical timing issues.
5. Dyamic voltage and frequency scaling (DVFS)
This method is used to vary the voltage and frequency based on requirement. The voltage
and/or frequency of the design can be scaled as per performance requirement.
An advance method AVFS has been introduced where the feedback is provided to controller
to decide voltage and/or frequency but it is very complex.
Example: Consider the following design.
This design consists of default with three different voltage domains APD1P2V, SPD1P0V and
APD0P8V.
APD1P2V – Always on power domain with 1.2V supply
SPD1P0V – Switching power domain with 1.0V supply
APD0P8V – Always on power domain with 0.8V supply
LS_LH – Level shifter low to high
LS_HL – Level shifter high to low
ISO – Isolation cell
RTF – Retension Flop
PMU – Power management unit
AON_BUF – Always on buffer
There are various commands provided to specify UPF completely and it can be easily
understandable by command itself, few of which are explained here to write UPF of above
example.
upf_version : As UPF have been modified stage by stage, it has different versions. So its
necessary to provide version of upf being used to interpret the upf commands.
upf_version [string]
The version can be 1.0, 2.0 etc.
Power Domain (PD) : A set of modules using a same voltage belongs a power domain. The
command “create_power_domain” is used to define a power domain and its characteristics.
UPF for the above Power Intent:
#———- Create Power Domains ————–#
create_power_domain TOP -include_scopecreate_power_domain APD1P2V -elements {
TOP/mod1 }
create_power_domain SPD1P0V -elements { TOP/mod2 }
create_power_domain APD0P8V -elements { TOP/mod3 }
#——– Supply Ports & Net Connections ————#
create_supply_port VDD1P2
create_supply_net VDD1P2 -domain TOP
create_supply_net VDD1P2 -domain APD1P2V -reuse
connect_supply_net VDD1P2 -ports VDD1P2
create_supply_port VDD1P0
create_supply_net VDD1P0 -domain TOP
create_supply_net VDD1P0 -domain SPD1P0V -reuse
create_supply_net VDD1P0_SW -domain SPD1P0V #switching net
connect_supply_net VDD1P0 -ports VDD1P0
create_supply_port VDD0P8
create_supply_net VDD0P8 -domain TOP
create_supply_net VDD0P8 -domain APD0P8V -reuse
connect_supply_net VDD0P8 -ports VDD0P8
create_supply_port VSS
create_supply_net VSS -domain TOP
create_supply_net VSS -domain APD1P2V -reuse
create_supply_net VSS -domain SPD1P0V -reuse
create_supply_net VSS -domain APD0P8V -reuse
connect_supply_net VSS -ports VSS
#———- Establish Connection ————-#
set_domain_supply_net TOP -primary_power_net VDD1P0 -primary_ground_net VSS
set_domain_supply_net APD1P2V -primary_power_net VDD1P2 -primary_ground_net VSS
set_domain_supply_net SPD1P0V -primary_power_net VDD1P0 -primary_ground_net VSS
set_domain_supply_net APD0P8V -primary_power_net VDD0P8 -primary_ground_net VSS
#———- Shut-down Logic for Reciever ————#
create_power_switch POWER_SWITCH -domain SPD1P0V \
-input_supply_port {VDD1P0 VDD1P0}\
-output_supply_port { VDD1P0 VDD1P0_SW} \
-control_port {PMU/ps_en } \
-on_state {state_name VDD1P0 {!ps_en}}
#———- Isolation Cell Setting ———–#
set_isolation iso_out -domain SPD1P0V \
-applies_to outputs \
-isolation_power_net VDD1P0 -isolation_ground_net VSS \
-clamp_value 1 \
-isolation_signal PMU/iso_en \
-location default
#———– Retention Logic for SPD ———-#
set_retention RTF -domain SPD1P0V \
-retention_power_net VDD1P0 \
-retention_ground_net VSS \
-save_signal {PMU/rtf_en high} \
-restore_signal {PMU/rtf_en low} \
#——– Level Shifter for multi-VDD Domain ———#
set_level_shifter LS_0P8_1P0 -domain SPD1P0V \
-applies_to inputs \
-location self \
-source APD0P8V.primary \
-input_supply_set APD0P8V.primary -output_supply_set SPD1P0V.primary
set_level_shifter LS_1P0_1P2 -domain APD1P2V \
-applies_to inputs \
-location self \
-source SPD1P0V.primary \
-input_supply_set SPD1P0V.primary -output_supply_set APD1P2V.primary
set_level_shifter LS_1P2_0P8 -domain APD0P8V \
-applies_to inputs \
-location self \
-source APD1P2V.primary \
-input_supply_set APD1P2V.primary -output_supply_set APD0P8V.primary
set_level_shifter LS_0P8_1P2 -domain APD0P8V \
-applies_to inputs \
-location self \
-source APD0P8V.primary \
-input_supply_set APD0P8V.primary -output_supply_set APD1P2V.primary
#———– Define Always ON Cell ————-#
define_always_on_cell -cells AON_BUF \
-power_switchable VDD1P0_SW -ground_switchable VSS \
-power VDD1P0 -ground VSS
#——— Create Power State Table ———–#
add_power_state TOP.primary \
-state ON { -supply_expr {power == ‘ {FULL_ON, 1.0} && ground == ‘ {FULL_ON, 0.0
}} \ -simstate NORMAL }
add_power_state APD1P2V.primary \
-state ON { -supply_expr {power == ‘ {FULL_ON, 1.2} && ground == ‘ {FULL_ON, 0.0
}} \ -simstate NORMAL }
add_power_state SPD1P0V.primary \
-state ON { -supply_expr {power == ‘ {FULL_ON, 1.0} && ground == ‘ {FULL_ON, 0.0
}} \ -simstate NORMAL }
-state OFF { -supply_expr {power == ‘ {OFF} && ground == ‘ {FULL_ON, 0.0 }} \
-simstate CURRUPT }
add_power_state APD0P8V.primary \
-state ON { -supply_expr {power == ‘ {FULL_ON, 0.8} && ground == ‘ {FULL_ON, 0.0
}} \
-simstate NORMAL }
Synthesis
Synthesis is process of converting RTL (Synthesizable Verilog code) to technology specific
gate level netlist (includes nets, sequential and combinational cells and their connectivity).
Goals of Synthesis
1. To get a gate level netlist
2. Inserting clock gates
3. Logic optimization
4. Inserting DFT logic
5. Logic equivalence between RTL and netlist should be maintained
Input files required
1. Tech related:
.tf- technology related information.
.lib-timing info of standard cell & macros
2. Design related:
.v- RTL code.
SDC- Timing constraints.
UPF- power intent of the design.
Scan config- Scan related info like scan chain length, scan IO, which flops are
to be considered in the scan chains.
3. For Physical aware:
RC co-efficient file (tluplus).
LEF/FRAM- abstract view of the cell.
Floorplan DEF- locations of IO ports and macros.
Synthesis steps
3. IO / Pin placement
IOs / Pins are placed at the boundary of the block. Usually pin placement information is pushed
down from FC floorplan. But these locations can be changed based on block critical
requirements. Any change in pin location has to be discussed with FC floorplan team. Timing
critical interfaces need special attention, like next 2-3 levels of logic from IOs are pre-placed
near the IOs). Source synchronous interfaces requires delay balancing taking OCV into
considerations (This will require manual placement & scripting)
4. Row creation
Rows area created in the design using cell-site (unit / basic). Rows aid in systematic placement
of standard cells. And standard cell power routes done considering rows.
Rows can be cut, wherever cell placement is not allowed OR hard placement blockage can also
be used.
5. Macro placement
Step 1 – Understand Pins & Orientation requirements of Macros
Step 2 – Follow data flow / hierarchy to place the Macros. Make use of reference floorplan
if available
Step 3 – All the pins of the Macros should point towards the core logic
Step 4 – Channels b/w macros should be big enough to accommodate all routing reqs &
should get a minimum of one pair VDD & VSS power grids in the channel
Automatic Floorplan / Macro-placement
Most of the PnR tools provide automatic floorplan option. Automatic floorplan option creates
its own macro placement based on the effort & other options. But these options are not matured
enough to give optimum floorplan for all kind of designs. This option will be handy, when
design has 100s of Macros, but generated floorplan needs lot of modification for further
optimizations.
How to qualify Macro – Placement
1. All macros should be placed at the boundary
2. Check the orientation & pin directions of all macros
3. Spacing b/w macros should be enough for routing & power grid
4. Macros should not block partition level pins
5. [Iterations] Less congestion & good timing QoR – These cannot be achieved in one
shot, but need few iterations [Thorough & deep analyses are the key things while
iterating]
6. Adding placement & routing blockages
Buffer only blockages are added in channels b/w macros. Partial placement blockages can be
added b/w the channels blocking sequential cells (whose placement in channels can degrade
CTS QoR). Partial blockages are added in congestion prone areas/notches/corners
8. Adding special cells (Well Taps, EndCaps, Spare Cells, Metal ECO-able cells etc)
Well connection – Almost all standard cell libraries are tap-less (substrate connections are not
done @ cell level). So Well-taps cells are added in partition/chip level to tie the wells to
VDD/VSS. Tap-gate spacing has to be met while adding well-tap array.
Endcap Cells – These cells are inserted to take care of boundary DRC of Wells & Other layers.
End Cap Cells ensure proper terminations of rows, so that no DRC are created. This is a
physical-only cell.
How to qualify Floorplan?
1. Check PG connections (For macros & pre-placed cells only)
2. LP / MV checks on floorplan database
3. Check the power connections to all Macros, specially analog/special macros if any
4. All the macros should be placed at the boundary
5. There should not be any notches / thin channels. If unavoidable, proper blockages has
to be added
6. Remove all unnecessary placement blockages & routing blockages (which might be put
during floor-plan & pre-placing)
7. Check power connection to power switches
8. Check power mesh in different voltage area voltage area
9. Check pin-layers & check layer directions (H-V-H)
PD Flow II – Placement & Optimization
Placement
In this stage, all the standard cells are placed in the design (size, shape & macro-placement is
done in floor-plan). Placement will be driven by different criteria like timing driven, congestion
driven, power optimization etc. Timing & Routing convergence depends a lot on quality of
placement. Different tasks in placement are listed below;
1. Pre-placement
2. Initial placement (Coarse placement)
3. Legalizations
4. Removing existing buffer trees
5. High Fan-out Net Synthesis (HFNS)
6. Iterations of timing/power optimizations [cell sizing, moving, net spitting, gate cloning,
buffer insertion, area recovery]
7. Area recovery
8. Scan-chain re-ordering
9. TIE cell insertions
Goals of placement
1. Timing, Power and Area optimizations
2. Routable design (minimal global & local congestion)
3. No/minimal cell density, pin density & congestion hot-spots
4. Minimal timing DRCs
Before starting the placement optimization, it’s always good practice to do some analyses &
checks on the design & tool settings. This would definitely help in design converge & reduce
iterations.
Things to be checked before placement
1. Check for any missing / extra placement & routing blockages
2. Don’t use cell list & whether it is properly applied in the tool
3. Don’t touch on cells & nets (make sure that, these are applied)
4. Better to have limit the local density (Otherwise local congestion can create issue in
routing / eco stages)
5. Understand all optimization options & placement switches set in the tool
6. There should not be any high WNS timing violations
7. Make sure that clock is set to ideal network
8. Take care of integration guidelines of any special IPs (These won’t be reported in any
of the checks). Have custom scripts to check these guidelines
9. Fix all the hard macros & pre-placed cells
10. Check the pin access
Pre-placement
1. Spare cell insertion / Metal ECO-able cells
2. Magnet placement (IOs / any other interface)
3. Custom / manual placement of special cells (very specific to design)
4. Insertion of De-Caps (Not everyone follows this)
5. Antenna diodes & buffers on block level ports
HFNS
All high fan-out nets will be synthesized (buffer tree) except clock nets & nets with don’t touch
attribute. Scan-enable and reset are few examples of high fan-out nets. HFNS honors max fan-
out setting.
Different Timing optimization techniques
Timing converge is one of key task in placement optimization. If timing QoR is bad, then
placement cannot be qualified. Bad timing QoR at placement stage would create difficulties in
timing convergence in further stages.
1. Assigning more weight to critical group path
2. Timing driven placement– high effort
3. Allowing LVT cells for optimizations (<5% of low / ultra low VT cells)
In most of the designs only 15-25% of the paths will be timing critical. So giving more weight
to these critical paths during optimization will aid in optimizing critical path delays. This can
be achieved by creating group paths and assigning more weight to the critical paths.
If design is timing critical, then timing driven-placement strategy has to chosen with high
effort of optimization (trade-off with runtime). But timing-driven placement is some design
can create local congestion hot-spots & also global congestion will increase. Cell-padding,
density screens, partial blockages and bounds can be used to reduce/fix these congestion issues.
Controlled usage of low-VT cells will help in optimizing timing critical paths. Most of the PnR
tools have the option to control VT usage.
Congestion reduction techniques
1. Cell padding
2. Use of density screens, placement blockages
3. Congestion driven placement (with high effort @ cost of runtime)
Congestion is one the major challenge in PNR of high/medium utilization designs. Placement
is first & key step where congestion analysis begins & it should be under control. Both global
& local congestion should be minimal with no local hotspots. A though analyses of congestion
map, cell density map & pin density will be help in deciding the quality of placement.
Local congested hot-spots are very common in timing critical, high utilization designs. Cluster
of AOI/OAI (Boolean function cells) / any high pin density cells will cause local hot-spots.
Power Optimization
Nowadays most of the designs are targeted to achieve less power consumption. It’s because of
growing demand of hand-held battery operated devices (smart phones, tabs) & IOT. So we
should keep an eye on static & dynamic power dissipation and make effort to reduce power
dissipation.
Dynamic power:
Transition & Load capacitance are the two key parameters which can be controlled in
placement stage to get optimum dynamic power. Iteration can be performed to arrive at
optimum max transition & max capacitance. Most of the tools have option to optimize the
power.
Dynamic power dissipation is directly proportional to toggle rate (switching activity). So to get
maximum benefit power optimization should be done on nets with high toggle rate. ‘Low
power placement’ helps to identify the net/cells with high toggle rates & load capacitance (wire
length) is optimized (reduced) to reduce power dissipation.
Leakage power:
High VT & Regular VT cells will have less leakage power compared to low & ultra low VT
cells. So it’s good idea to block / allow partial usage of low & ultra VT cells.
Scan chain Re-Ordering
DFT tool flow makes a list of all the scan-able flops in the design, and sorts them based on
their hierarchy and perform scan stitching (clock domains, maximum chain length constraints
will be considered). Scan-chain at this stage will not be layout friendly.
In APR tool scan chains are reordered on the basis of placement of flops & Q-SI routing. This
is nothing but scan-chain reordering. Scan-chain reordering helps to;
Reduce congestion, Total wire-length
Require fewer repeaters in Q-SI path
Below diagram shows pre-layout scan-chain stitched based on the hierarchy.
If scan chain reordering is not done, congestion & net/wire length will increase. Below diagram
shows details:
Same flop placement with scan-chain reordered has better congestion & wire / net lengths are
reduced. Refer below diagram:
What if the design has different power domains?
Placement flow is almost same. But in case of Abutted voltage area designs, an extra stage
“Voltage Area Feed-through” is required, before placement stage.
Following tasks are done in VA-FT stage:
1. Enabling VA-FT creation in tool flow
2. Quick placement of the design (Requirement of VA-FT will known only after
placement of all standard cells)
3. Global route (To identify where all VA-FTs are required)
4. VA-FT creation
5. Disable VA-FT
6. Continue with place & optimizations
An example of FT port creation & FT buffer addition through different voltage areas (power
domains) is shown in below diagram;
How to qualify placement
1. Logical equivalence check & low power checks
2. Check legalization
3. Check PG connections of all the cells
4. Check congestion, place density & pin density maps. All these should be under control
5. Timing QoR / Convergence. There should not be any high WNS violations & TNS,
NVP must be under control
6. Minimal max tran & max cap violations
7. Check whether all don’t touch cells & nets are preserved
8. Check for don’t use cells (Should be Zero/ same as post Syn)
Clock Tree Synthesis- part 1
Clock Tree Synthesis (CTS) is one of the most important stages in PnR. CTS QoR decides
timing convergence & power. In most of the ICs clock consumes 30-40 % of total power. So
efficient clock architecture, clock gating & clock tree implementation helps to reduce power.
Sanity checks need to be done before CTS
Check legality.
Check power stripes, standard cell rails & also verify PG connections.
Timing QoR (setup should be under control).
Timing DRVs.
High Fanout nets (like scan enable / any static signal).
Congestion (running CTS on congested design / design with congestion hotspots can
create more congestion & other issues (noise / IR)).
Remove don’t_use attribute on clock buffers & inverters.
Check whether all pre-existing cells in clock path are balanced cells (CK* cells).
Check & qualify don’t_touch, don’t size attributes on clock components.
Preparations
Understand clock structure of the design & balancing requirements of the designs. This
will be help in coming with proper exceptions to build optimum clock tree.
Creating non-default rules (check whether shielding is required).
Setting clock transition, capacitance & fan-out.
Decide on which cells to be used for CTS (clock buffer / clock inverter).
Handle clock dividers & other clock elements properly.
Come up with exceptions.
Understand latency (from Full chip point of view) & skew targets.
Take care of special balancing requirements.
Understand inter-clock balancing requirements.
Difference between High Fan-out Net Synthesis (HFNS) & Clock Tree Synthesis:
Clock buffers and clock inverter with equal rise and fall times are used. Whereas HFNS
uses buffers and inverters with a relaxed rise and fall times.
HFNS are used mostly for reset, scan enable and other static signals having high fan-
outs. There is not stringent requirement of balancing & power reduction.
Clock tree power is given special attention as it is a constantly switching signal. HFNS
are mostly performed for static signals and hence not much attention to power is needed.
NDR rules are used for clock tree routing.
Why buffers/inverters are inserted?
Balance the loads.
Meet the DRC’s (Max Tran/Cap etc.).
Minimize the skew.
What is the difference between clock buffer and normal buffer?
Clock buffer have equal rise time and fall time, therefore pulse width violation is
avoided.
In clock buffers Beta ratio is adjusted such that rise & fall time are matched. This may
increase size of clock buffer compared to normal buffer.
Normal buffers may not have equal rise and fall time.
Clock buffers are usually designed such that an input signal with 50% duty cycle
produces an output with 50% duty cycle.
CTS Goals
Meet the clock tree DRC.
Max. Transition.
Max. Capacitance.
Max. Fanout.
Meet the clock tree targets.
Minimal skew.
Minimum insertion delay.
Clock Tree Reference
By default, each clock tree references list contains all the clock buffers and clock inverters in
the logic library. The clock tree reference list is,
Clock tree synthesis.
Boundary cell insertions.
Sizing.
Delay insertion.
Boundary cell insertions
When you are working on a block-level design, you might want to preserve the
boundary conditions of the block’s clock ports (the boundary clock pins).
A boundary cell is a fixed buffer that is inserted immediately after the boundary clock
pins to preserve the boundary conditions of the clock pin.
When boundary cell insertion is enabled, buffer is inserted from the clock tree reference
list immediately after the boundary clock pins. For multi-voltage designs, buffers are
inserted at the boundary in the default voltage area.
The boundary cells are fixed for clock tree synthesis after insertion; it can’t be moved
or sized. In addition, no cells are inserted between a clock pin and its boundary cell.
Fig. (a) shows a buffer driving four other cells. In fig. (b), the load is split using
Cloning. The first buffer is cloned and each buffer now drives half of the load.
In fig.(c), the load is split using buffering. Two new buffers are added at the
output of buffer A. Now buffer A is driving C1 and C2 and each of them are
driving half of the load.
SETUP :
Reasons for Setup Violations:
Tcomb :
Tcomb delay is high.
High RC metal might be used in Tcomb for routing which increases the net
delay.
More HVT Cells in data path.Lower drive strength cells in data path.
Tsetup of capture flop is more.
More negative skew : Launch clock is late and capture clock is early.
Crosstalk delay :Signals switching in opposite direction resulting in more delay.
Fixes :
Vt swapping : Replace HVT cells with LVT/ULVT cells.
Upsize drivers in data path.
For long nets, if adding a buffer can reduce RC, improve transition and hence improve
timing, then add buffers.
Reduce fanout.
Layer optimization in data path : Use higher metals with lower RC Values to route in
data path. This is preferred only if the timing path is critical.
Fix cross talk using NDR Rules during routing stage.
HOLD :
Reasons for Hold Violations:
Tcomb delay is less due to :
Move LVTs and ULVTs in data path.
High drive strength drivers in datapath.
Told of capture is more.
More positive skew.
Cross Talk : Signals switching in same direction makes the data arrive early.
Fixes :
Vt swapping : Replace LVT/ULVT cells with HVT cells.
Add buffers in data path to increase data path delay.
Downsize drivers in data path.
Layer optimization in data path : Use lower metals with higher RC Values to route in
data path.
Fix cross talk using NDR Rules during routing stage.
AREA AND POWER OPTIMIZATION:
Need for area and power optimization:
Clk cells are larger than normal cells. Hence, they take more area and consume more
power.
LVTs are used in clock path as they have less on chip variations and less short circuit
power. But they have more subthreshold leakage power.
Clock is a high switching net. Hence , it has more switching power.
Fixes :
Area Optimization :
Downsize Clock buffers if a smaller sized clock buffer can drive the same load.
Power Optimization :
Downsize Clock buffers if a smaller sized clock buffer can drive the same load.
Replace HVTs with LVTs/ULVTs in datapath.
CONGESTION :
Causes :
The addition of extra buffers during CTS to achieve minimum skew and minimum insertion
delay can cause congestion.
Fixes :
Post CTS, we can’t move any clock cells. So, for a well optimized design Post CTS,
we have to do a proper congestion driven placement keeping in mind the ulitization
post CTS in the initial stages itself.
Cell padding : In congestion prone area, cell padding should be applied for standard
cells.
Routing
Routing is the stage after Clock Tree Synthesis and optimization where-
Exact paths for the interconnection of standard cells and macros and I/O pins are
determined.
Electrical connections using metals and vias are created in the layout, defined by the
logical connections present in the netlist.
After CTS, we have information of all the placed cells, blockages, clock tree buffers/inverters
and I/O pins. The tool relies on this information to electrically complete all connections defined
in the netlist such that-
There are minimal DRC violations while routing.
The design is 100% routed with minimal LVS violations.
There are minimal SI related violations.
There must be no or minimal congestion hot spots.
The Timing DRCs are met.
The Timing QoR is good.
The different tasks that are performed in the routing stage are as follows-
Global Routing (also performed during placement stage).
Track assignment.
Detailed Routing.
Search and Repair.
Goals of Routing
Minimize the total interconnect/wire length.
Maximize the probability that the tool can complete routing.
Minimize the critical path delay.
Minimize the number of layer changes that the connections have to make (minimizing
the number vias).
Complete the connections without increasing the total area of the block.
Meeting the Timing DRCs and obtaining a good Timing QoR.
Minimizing the congestion hotspots.
SI driven: reduction in cross-talk noise and delta delays.
Routing Prerequisites
All the design rules required during the routing stage must be defined in the technology
file.
The design must be placed and optimised. CTS and optimization should be complete.
The PG nets must be pre-routed and physically connected to all macros and standard
cells.
The timing DRC violations and the timing QoR, estimated after CTS must be
acceptable.
The measured congestion should be tolerable.
There should not be any ideal nets.
High fanout nets should not be greater than the specified limit.
Check for any optimization that needs to be done to fix any errors.
Checking routability.
After the placement and clock tree synthesis stage we must check if the design is ready
for routing. The checks performed are as follows-
Check if the ports of the standard cells are blocked i.e. the physical pins are not
accessible.
Checks for overlapping cells in the design. Overlapping causes pins to short and
cause metal DRC violations.
Check for pins underneath PG routes (they may be inaccessible and cause
violations on metals) .
Check if the ports of the top-level or macro cell are blocked and physically
inaccessible.
Check for pins that are outside the design boundary (Out-of-Boundary pins).
Check for blocked PG ports.
Check if there are frozen nets blocking ports.
Check for blocked unconnected pins.
Check if all pins in the design are on the routing tracks.
Routing Constraints
Setting routing constraints guides the tool during routing. The constraints to be set are
as follows-Set constraints to number of layer to be used during routing.
Setting the maximum length for the routing wires.
Set stringent guidelines for minimum width and minimum spacing.
Set preferred routing directions to specific metal layers during routing.
Constraining off-grid routing.
Blocking routing in specific regions.
Setting limits on routing to specific regions.
Setting precedence to routing regions.
Constraining the routing density.
Constraining the pin connections.
Restricting the degree of rerouting.
Global Routing
Global routing is a coarse-grain assignment of routes, which first partitions the routing region
into tiles/rectangles called global routing cells (gcells) and decides tile-to-tile paths for all nets
while attempting to optimize some given objective function (e.g., total wire length and circuit
timing), but doesn’t make actual connections or assign nets to specific paths within the routing
regions. By default, the width of a gcells is same as the height of a standard cell and is aligned
with the standard cell rows.
Blockages, pins, and routing tracks inside the cell, dictate the routing capacity for every gcell.
Then all nets assigned to the gcell are noted and the demand for the wire tracks in each gcell
are calculated and overflows are reported.
Global routing is done in two stages namely-
The initial routing stage, wherein the unconnected nets are routed and overflow for each
gcell is calculated.
Rerouting stages, where the congestion around gcells with net overflows are reduced
by ripping off and rerouting the net.
After the initial routing stage and each rerouting stage, design statistics and congestion data are
reported. A summary of wire length and via count at the end of the Global routing stage.
There are three types of Global Routing namely-
Time-Driven Global Routing- The net delays are calculated before global routing.
Cross-Talk Driven Global Routing- Avoids the creation of long tile-to-tile paths that
run parallel on adjacent tracks.
Incremental Global Routing- Performed using existing global route information.
Track Assignment
Track assignment is a stage wherein the routing tracks are assigned for each global routes. The
tasks that are performed during this stage are as follows-
Assigning tracks in horizontal and vertical partitions.
Rerouting all overlapped wires.
Track Assignment replaces all global routes with actual metal layers. Although all nets are
routed(not very carefully), there will be many DRC, SI and timing related violations, especially
in regions where the routing connects the pins. These violations are fixed in the succeeding
stages.
Detail Routing
The detailed router uses the routing plan laid by the router during the Global Routing and Track
Assignment and lays actually metal to logically connect pins with nets and other pins in the
design. The violations that were created during the Track Assignment stage are fixed through
multiple iterations in this stage.
The main goal of detailed routing is to complete all of the required interconnect without leaving
shorts or spacing violations. The detailed routing starts with the router dividing the block into
specific areas called switch boxes or Sbox, which are generally expressed in terms of gcells.
These boxes align with the gcell boundary. For example, a 3x3 Sbox is a box which encompass
9 gcells.
Search And Repair
The search-and-repair stage is performed during detailed routing after the first iteration. In
search-and-repair, shorts and spacing violations are located and rerouting of affected areas to
fix all possible violation is executed.
Routing optimization and Chip Finishing
ROUTING OPTIMIZATION
Routing optimization is a step performed after detailed routing in the flow.
Inaccurate modeling of the routing topology may cause timing, signal integrity and
logical design constraint related violations.
This may cause conditions wherein fixing a violation would create other violations and
many such scenarios may cascade to make it very difficult for timing closure with no
timing DRCs.
Hence it is necessary to fix and optimize the routing topology.
Routing optimization involves-
Fixing timing violations.
Fixing LVS (opens & shorts).
Fixing DRCs.
Fixing Timing DRCs (Meet max transition, max capacitance and max fanout).
Finding & Fixing Antenna violations (using jumpers and antenna diodes).
Area and Leakage power recovery.
Fixing SI related issues.
Redundant via insertion.
This post will concentrate on Finding & Fixing Antenna violations and Redundant via
insertion. The other topic will be covered in subsequent posts.
Antenna Violations
During IC fabrication, the wafer usually undergoes various processing steps, such as
metalization (laying of metal wires) and etching (to make the surface flat). During the
metalization step, few of the nets connecting the gate terminals can be floating as upper
metal layers have not been fabricated yet. In the plasma etching process (widely used
in recent fabrication processes), there may be accumulation of unwanted electrostatic
charges on these floating nets, which act as antennas.
Typically, in ICs nets are driven either from source or drain of the device and connects
a receiver gate terminal over a gate oxide. Gate oxides being thin for higher technology
nodes, are susceptible to electrostatic discharges and are in danger of getting ruptured
due to higher potentials than the breakdown voltage.
When these charges flow through the devices it can rupture the gate of the device, there
by leading to a total chip failure. This phenomenon of an electrostatic charge being
discharged into the device is known as “antenna effect”.
Every FAB have their own set of rules (depends on the type of technology the ICs are
being fabricated) to avoid antenna violations during IC design.
In order to prevent antenna problems, tools verify that for each input pin the metal antenna area
divided by the gate area is less than the maximum antenna ratio given by the foundry:
(Antenna area) (Gate area) < (Max Antenna ratio)
Gate area: The are of transistors which is the intersection of the diffusion and the polysilicon
layers.
Antenna area: Total area of metal connected to gate terminal.
Max Antenna Ratio: Maximum allowable ratio of Antenna area to Gate area.
Fixing antenna violations
There are many techniques to fix antenna violations. The widely used techniques are described
in this post.
Antenna Diode:
Reverse biased diodes (Zener diodes) inserted close to the gate terminal to provide a
discharge path for the electrostatic charges during plasma etching.
This reverse biased diode will not affect the operation of the circuit as it will conduct
only if the potential reaches its breakdown potential, thus protecting the respective gate
from damage.
The general rule of practice is to use an n-type diode as p-type diodes requires extra n-
well biasing.
Fig1: Antenna Diode.
Limitations of using antenna diodes:
Wastage of core area – If the number antenna violations is large, the overuse of antenna
diodes eats up the core area meant for standard cells.
These diodes eat up extra placement and routing silicon resources which becomes a
costly during fabrication.
Potential Forward biasing of diode – The antenna diodes are usually placed in a back
to back fashion where there is an antenna violation. If they are not biased (reverse
biased) in a correct way, there is a potential of one of the diodes to get forward biased.
This is usually seen in low power designs when one of the power source gets switched
off. One such case is shown in the example figure below. Here D2 is always in a reverse
biased condition. But in the case of D1, if the cathode is turned off, it gets forward
biased due to the reverse biasing of D2. Hence, a lot of care must be takes such that
both anode and cathode are in the same potential or the anode must be at ground
potential.
Reducing load– Cells driving more load will be drawing more current. Hence reducing
load will reduce IR drop.
Downsizing– Cells of smaller size will draw less current. But the transition of cells
should not become worse.
The number of power switches can be increased to reduce IR drop
It should be made sure that all the power pins of macros are properly connected to the
power rails.
Note:
For accurate dynamic analysis vcd files (switching activity file) with sdf (standard delay
format) is better.
Glitches produced from combinational circuit may act as instantaneous switch.
Reducing them will decrease the pessimism of dynamic IR drop analysis.
IR drop analysis is done in RC worst corner (corner having more resistance of rails)
and FF process, high voltage and high temp corner (PVT corner) because current is
drawn more in this corner.
ELECTRO MIGRATION (EM):
Electro migration (EM) refers to the unwanted movement of materials in a semiconductor. If
the current density is high enough, there can be a momentum transfer from moving electrons
to the metal ions making the ions to drift in the direction of the electron flow. This results in
the gradual displacement of metal atoms in a semiconductor, potentially causing open and short
circuits. Due to high current density and resistance of metal in the recent technologies EM has
become dominating.
EM leads to open circuits due to voids in wires or vias and leads to short circuits due to
extrusions or “hillocks” on wires. Either can cause a system failure that is hard to
diagnose.
During older technology nodes EM was considered only on power wires and clock
wires. But now signal wires also need to be considered due to increased current density
in them.
Fin-FETs have more current density than planar transistors, thus making EM worse,
especially in conjunction with narrow wires.
Copper interconnects worsen EM because the copper molecule moves faster.
In the recent technologies the lower supply voltages is helping to reduce EM, but not
enough to offset all the other causes that amplify it.
EM is worse at higher temperatures.
EM fixing techniques such as widening wires, can increase area and cause timing
violations. EM fixing needs to be timing-driven.
Methods to fix EM
1. Widen the wire to reduce current density
2. Keep the wire length short
3. Reduce buffer size in clock line
Not in PD point of view
1. Reduce the frequency
2. Lower the supply voltage
SCAN Tracing:
Files (Scan DEF and .V)
In scan tracing we are checking the connection of flip flops, there should not be any floating
connections. The reason why we are doing scan tracing is because, in formality check we
disable the Scan (so it doesn’t check the scan chain), and we are assuring that there is no issue
with scan chains.
DFM:
Files (GDS and Rule deck file)
As the technology scales down, manufacturing process is becoming more complex. DFM
(design for manufacturing) is the stage in which we modify or add extra things (like redundant
via insertion, wire spreading). These techniques will increase the yield and reliability in the
design.
Few DFM steps:
1. Redundant Via insertion.
2. Wire spreading.
3. Wire slotting.
4. Metal filling.
Formality Check:
Files (Reference netlist, Implemented Netlist, .V and .Lib)
The basic idea behind FM check is to compare implemented netlist with reference netlist
(Synthesis stage netlist / golden netlist). We check whether the logic output value given in both
stages are same.
Example 1: If we check the FM in the Scan mode (i.e, in ON state) we will get the formality
issues, because during the scan chain reordering the position of Flip Flops will be changed with
respect to SCAN def file. To overcome this issue, we have to disable the scan port (by assigning
it’s value to “0”).
Example 2: Undriven port Issue: In golden netlist for the floating pins binary values are
assigned like “0” or ”1”, but when it gets implemented floating pin is assigned as “X ” which
leads to mismatch. To resolve this issue, we set both pins in implemented and reference netlist
either “0” or “1”.
Power Analysis:
Files (SPEF, SAIF,.V, Lib, UPF and SDF)
In Power analysis we calculate the power dissipation. Two types of power dissipation, (i)
Leakage Power (ii) Dynamic power. Leakage power is basically static power which is dissipate
during the Off state or non-toggling (when the input data is fixed) state of device, and for the
dynamic power the activity factor is required, which is present in the SAIF (switching activity
interchange format) file. We also check for hot spot in the design, the hot spot is basically the
small region where the higher power dissipation is present.