0% found this document useful (0 votes)
84 views99 pages

Synthesis I

jhugitgyig

Uploaded by

senthil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views99 pages

Synthesis I

jhugitgyig

Uploaded by

senthil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 99

Synthesis I

ROHIT KHANNA
Advanced Computing Training School (ACTS)
C-DAC, Pune

2015 Centre for Development of Advanced Computing

Synthesis

HDL Design Flow


HDL CODE
HDL Simulation
SYNTHESIS
Gate Level Simulation
PLACE & ROUTE
Post Layout Simulation

2015 Centre for Development of Advanced Computing

Synthesis

HDL Simulation
Test Inputs are applied to HDL code

module gate (input a , b,


output c);

assign c = a & b;
endmodule

B
C

Test Bench
2015 Centre for Development of Advanced Computing

Synthesis

Gate Level Simulation


Test Inputs are applied to gate Level Netlist
A
C

B
C

Test Bench

2015 Centre for Development of Advanced Computing

Synthesis

Post Layout Simulation


Test Inputs are applied to logic on device with delays
A
B

LUT

A
B
C

Test Bench

Interconnect + logic
delay

2015 Centre for Development of Advanced Computing

Synthesis

Comparing Synthesis and Simulation


Results

Timing Statements
HDLs

were initially meant for documentation and simulation


so many simulation constructs are not supported by synthesis

Wait

for statement(VHDL)

This statement halts the execution of process for a specified


time period.
This construct is not supported by Synthesis Tools
wait for 10 ns;

When used for synthesis, Tool will report Error.


2015 Centre for Development of Advanced Computing

Synthesis

Timing Statements
After

statement (VHDL) or # (verilog)

This statement delays the execution of statements by a


specified time.
This construct is ignored by Synthesis Tools
a<=b after 10 ns; (VHDL)
or
#10 a=b; (Verilog)

HDL Simulation and RTL Simulation results will not match


2015 Centre for Development of Advanced Computing

Synthesis

If-else and Case


Synthesis

tool supports both the statements with different


functionality.

Simulation
If-else
Case

results are identical for both.

statement generates Priority based structure.

statement generates Parallel Structure.

When

if-else ladder is large, it can result in slower circuits.

Simulation

and Synthesis results are not same.

2015 Centre for Development of Advanced Computing

Synthesis

If-else and Case


If-else

always @ (*)
if (sel==2b00)
op=a;
else
If (sel==2b01)
op=b;
else
If (sel==2b10)
op=c;
else
op=d;

case
always @ (*)
case (sel)
2b00 : op=a;
2b01 : op=b;
2b10 : op=c;
default : op=d;
endcase

2015 Centre for Development of Advanced Computing

Synthesis

If-else and Case


If-else

case

d
c

M
u
x
b

M
u
x
a

M
u
x

op

a
b
c
d

Mux

op

Sel=10
Sel=01

sel
Sel=00

IO delay depends upon sel line

IO delay is same

2015 Centre for Development of Advanced Computing

Synthesis

Initial Value
Dont

assign Initial values as they are ignored by synthesis


tools(as per synthesis standards).

This

is true for a ASIC. Synthesis tools for FPGA may not ignore
initial value.

The

functionally of Simulated design may not match with that


of Synthesized design.
Example
VHDL
Signal a : integer := 7;

Verilog
reg [3:0] a = 4b1011;

2015 Centre for Development of Advanced Computing

Synthesis

Sensitivity List
Incomplete

sensitivity list result in mismatch between synthesis


and simulation result.

Sensitivity

list is ignored by Synthesis Tools but not by Simulation

tool
Example
always @ (*)
if (sel==1)
op=a;
else
op=b;
Result

Synthesis : MUX

process (sel) begin


if (sel=1) then
op<=a;
else
op<=b;
end if; end process;
Simulation: wont be a Mux

2015 Centre for Development of Advanced Computing

Synthesis

Coding Guidelines

Use shorthand Expression


process (a , b)
begin
C(3)<= A(3) and B(3);
C(2)<= A(2) and B(2);
C(1)<= A(1) and B(1);
C(0)<= A(0) and B(0);
end process;

and (C[3],A[3],B[3]);
and (C[2],A[2],B[2]);
and (C[1],A[1],B[1]);
and (C[0],A[0],B[0]);

process (a , b)
begin
For i in 0 to 3 loop
C(i)<= A(i) and B(i);
end loop;
end process;

and [3:0] (C,A,B);

2015 Centre for Development of Advanced Computing

Synthesis

Avoid use of Buffer port


Buffer

ports are used when you want to read your output

port.
Buffer

are not considered good for design purpose.

User

should create dummy signal that if fed to input and


assigned to output port.

2015 Centre for Development of Advanced Computing

Synthesis

Avoid use of Buffer port


entity xor_feed is
port(a : in std_logic;
b : buffer std_logic);
end entity;

entity xor_feed is
port(a : in std_logic;
b : out std_logic);
end entity;

architecture ach of xor_feed is


begin
b <= a xor b;
end architecture;

architecture ach of xor_feed is


signal temp : std_logic;
begin
temp <= a xor temp;
b <= temp;
end architecture;

2015 Centre for Development of Advanced Computing

Synthesis

Unnecessary loop calculation


Avoid

placing non changing expression inside loop.

This

prevent tools from spending more time on optimizing


redundant logic
Example

for (i=0; i <5; i=i+1)


begin
--unchanging expression
a=b;
data[i]= din[i];
end
2015 Centre for Development of Advanced Computing

a=b;
for (i=0; i <5; i=i+1)
begin
data[i]= din[i];
end
Synthesis

Optimizing
Arithmetic Expressions

Optimizing Arithmetic Expressions


Synthesis Tools

tries to rearrange an expression in order to


achieve optimized implementation.

There

are three types of arithmetic optimization


Merging Cascaded Adders
Arranging Expression Trees
Sharing Common Expressions

2015 Centre for Development of Advanced Computing

Synthesis

Merging Cascaded Adders


Assume

that design has two cascaded adder and one of the


inputs is single bit.

module add ( input c,


input [1:0] a, b,
output [2:0] z);
wire [2:0] t;

assign t= a + b;
assign z= t + c;

endmodule
2015 Centre for Development of Advanced Computing

Synthesis

Merging Cascaded Adders


In

such a case tool will optimize design to one adder with


carry input (Full Adder)

assign t= a + b;
assign z= t + c;

C (carry input)

or
assign t= a + c;
assign z= t + b;
or
assign z= a + b + c;
2015 Centre for Development of Advanced Computing

+
Z

Synthesis

Determine Number of Adders


Apply

Merging Cascaded Adder concept and determine


numbers of adders required

module add ( input [1:0] a, b, d,


output [3:0] z);

wire [2:0] t;
assign t= a + d;
assign z= t + b;
endmodule

2 Adders since no 1-bit input

2015 Centre for Development of Advanced Computing

Synthesis

Arranging Expression Trees


Tools

tries to optimize expression in order to achieve speed


requirements by minimizing delay

The

rearrangement of adders should be done depending upon


the arrival time of each signal.
module add ( input [1:0] c, d,
input [1:0] a, b,
output [3:0] z);
assign z= a + b + c + d;
endmodule

2015 Centre for Development of Advanced Computing

Synthesis

Arranging Expression Trees


Tool

treats code as if brackets were present in HDL code as


given below
A

5 ns

Z = a + b + c + d;

+
5 ns

Z = ( (a + b ) + c ) + d;

+
5 ns

+
Z

2015 Centre for Development of Advanced Computing

Synthesis

Arranging Expression Trees


Adders

arrangement can be modified to achieve minimum delay


depending upon arrival time of signals and applying bracket
z= a + b + c + d; Assume that a, b, c and d arrive at same time

In

this case delay can be reduced if we replace 3 stage adder


to 2 stage adder.

This

can be achieved by applying brackets wisely.

2015 Centre for Development of Advanced Computing

Synthesis

Arranging Expression Trees


Z = (a + b ) + (c + d);

5 ns

+
5 ns

5 ns

+
Z

2015 Centre for Development of Advanced Computing

Synthesis

Arranging Expression Trees


Z = a + b + c + d;
Assume that a , b and d arrive at same time and c is the last one
to arrive
In

this case c should be added at last

Z = a + b + d + c;

Rearranging

Now a , b and d should be added at same time but ideally not


possible since adder consists of two inputs.

Z = ( ( a + b ) + d ) + c;
2015 Centre for Development of Advanced Computing

Synthesis

Arranging Expression Trees


B

A
5 ns

+
5 ns

Z = ( ( a + b ) + d ) + c;
or

Z = ( ( d + b ) + a ) + c;

or
Z = ( ( d + a ) + b ) + c;

C
5 ns

+
Z

2015 Centre for Development of Advanced Computing

Synthesis

Arranging Expression Trees


Arrangement

considering overflow in adders

Example 1
module add ( input [5:0] c,
//6-bits
input [3:0] a, b, //4-bits
output [6:0] z); //7-bits
wire [3: 0] t ;

// wire declared to store result of a and b

assign t= a + b;
assign z= t + c;

// 5-bit result truncated to 4-bit

endmodule
2015 Centre for Development of Advanced Computing

Synthesis

Arranging Expression Trees


Synthesis

Outcome

B [4-bits]

A[4-bits]

+
C [6-bits]

T [4-bits]

+
Z [7-bits]

2015 Centre for Development of Advanced Computing

Synthesis

Arranging Expression Trees


Example 2
module add ( input [5:0] c,
//6-bits
input [3:0] a, b, //4-bits
output [6:0] z);
//6-bits
assign z= a + b + c;

// No Temporary variable declared

endmodule
Tool

understands that sum of a and b may produce 5-bits result


so it will automatically use a temporary variable of 5-bits

2015 Centre for Development of Advanced Computing

Synthesis

Arranging Expression Trees


Synthesis

Outcome. Addition result different from Example 1


B [4-bits]

A[4-bits]

+
C [6-bits]

T [5-bits]

+
Z [7-bits]

2015 Centre for Development of Advanced Computing

Synthesis

Arranging Expression Trees


Example 3
module add ( input [5:0] c,
//6-bits
input [3:0] a, b, //4-bits
output [6:0] z);
//6-bits
assign z= a + b + c;
endmodule
Assume A

is late arriving signal, B and C arrive at same time

2015 Centre for Development of Advanced Computing

Synthesis

Arranging Expression Trees


Synthesis

Outcome. Addition Result same as Example 2


C [6-bits]

B [4-bits]

+
Z= (b+c) +a;

A [4-bits]

T [7-bits]

+
Z [7-bits]

2015 Centre for Development of Advanced Computing

Synthesis

Sharing Common Expressions


If

same expression appears in more than one equation, user


will like to share the expression to achieve low area.

One

option is to assign common expression to a temporary


variable
Example
Z = a + b + c;
Y = a + c + d;

a + c is a common
expression

temp = a + c;
Z = temp + b;
Y = temp + d;
2015 Centre for Development of Advanced Computing

Synthesis

Sharing Common Expressions


Tools

are intelligent enough to identify common expression


and share it automatically if specified in same order.

Example
Z = a + b + c;
Y = a + b + d;

Tool uses common adder for a + b

Z = a + b + c;
Y = d + a + b;

Tool uses different adder for a + b

2015 Centre for Development of Advanced Computing

Synthesis

Sharing Common Expressions


How

to avoid such a problem?

Solution
Using brackets (parenthesis) for common expression

Z = ( a + b ) + c;
Y = d + ( a + b );

Tool uses same adder for a + b

2015 Centre for Development of Advanced Computing

Synthesis

Sharing Common Expressions


In

some cases sharing common expression may result in more


logic resources
Sharing Enabled
Example
If (sel1)
Y<= a + b;
else
Y <= c + d;

Three adders required


a+b
c+d
e+f
Sharing Disable

If (sel2)
Z <= e + f;
else
Z <= a + b;

Two adders required


a + b or c + d
a + b or e + f

2015 Centre for Development of Advanced Computing

Synthesis

Sharing Common Expressions


Synthesis Result when sharing is enabled
C

sel0

Mux

Mux

2015 Centre for Development of Advanced Computing

sel2

Synthesis

Sharing Common Expressions


Constant Propagation
If

outcome of any expression is a constant then no logic


generation takes place.

Constant

value is directly provided.

Example
input [31:0] c;
output [31:0] d
integer a=3;
wire [31:0] b;
assign b = a + 2;
assign d = b + c;

+
Logic not generated

2015 Centre for Development of Advanced Computing

Synthesis

Sharing Common Expressions

Logic resources required by adder

Cin

+
Sum

Carry

Gate count =5
2015 Centre for Development of Advanced Computing

Synthesis

Sharing Common Expressions

Logic resources required by Multiplexer


D0

D1
Mux
S

Gate count = 4
Conclusion : Adders require more area and hence are costly
2015 Centre for Development of Advanced Computing

Synthesis

Sharing Common Expressions


Sharing Complex Operators
B

assign z = x?(a + b) : (c + d);


X

mux
Z

Output is either (a + b) or (c + d) that means at a time only one


addition is required but other is still functioning resulting in
additional area, cost, power requirement
2015 Centre for Development of Advanced Computing

Synthesis

Sharing Common Expressions


Sharing Complex Operators
assign t1 = x ? a : c ;
assign t2 = x ? b : d ;
assign z = t1 + t2 ;
A

Result in Reduction of
Cost
Area
Power Requirements
D

mux

mux

t1

t2

+
Z
2015 Centre for Development of Advanced Computing

Synthesis

Signal and Variable


Signal

/ Non Blocking statements updates at end of process /


always block whereas variables / Blocking Statements updates
immediately
Example 1
process (a, b, c,d)
begin
d <= a;
x <= c xor d;
d <= b;
--- Overrides D<=A --y <= c xor d;
end process;

always @ (*)
begin
d <= a;
x <= c ^ d;
d <= b;
y <= c ^ d;
end

2015 Centre for Development of Advanced Computing

Synthesis

Signal and Variable


Synthesis Result
A
B

XOR

C
Y

2015 Centre for Development of Advanced Computing

Synthesis

Signal and Variable


Example 2
process (a, b, c)
variable d : std_logic;
begin
d := a;
x <= c xor d;
d := b;
y <= c xor d;
end process;

always @ (*)
begin
d = a;
x <= c ^ d;
d = b;
y <= c ^ d;
end;

2015 Centre for Development of Advanced Computing

Synthesis

Signal and Variable


Synthesis Result
A

XOR

XOR

2015 Centre for Development of Advanced Computing

Synthesis

High Performance Coding Techniques

Multiplexer using Tristate


In

Spartan family(XC4000) a 4:1 Multiplexer can be implemented


using single CLB.

If

user wants to implement 16:1 Multiplexer. How many CLBs


are required? five

As

number of CLBs increases, area requirements and delay


increases.

Xilinx

recommend to use internal tristate buffers to implement


multiplexer that requires more than one CLB for implementation

2015 Centre for Development of Advanced Computing

Synthesis

Multiplexer using Tristate


Sel[0]

One Hot Encoded

A
Sel[1]

assign out = sel[0] ? a : 1bz;


assign out = sel[1] ? b : 1bz;
assign out = sel[2] ? c : 1bz;
assign out = sel[3] ? d : 1bz;
assign out = sel[4] ? e : 1bz;

Sel[2]
out

Sel[3]

Sel[4]

2015 Centre for Development of Advanced Computing

Synthesis

Multiplexer using Tristate


Sel=00

Binary Encoded

A
Sel=01

assign out = (sel==2b00) ? a : 1bz;


assign out = (sel==2b01) ? b : 1bz;
assign out = (sel==2b10)? c : 1bz;
assign out = (sel==2b11)? d : 1bz;

out

Sel=10

Sel=11

I->O delay will remain same if


number of inputs are increased
2015 Centre for Development of Advanced Computing

Synthesis

Multiplexer using CLB


assign out = sel[1] ? (sel[0] ? d : c) : (sel[0] ? b : a);
A
B

CLB

S[0]
CLB
C
D

CLB

S[0]
S[1]

I->O delay will increase if number


of inputs are increased

2015 Centre for Development of Advanced Computing

Synthesis

Multiplexer using Tristate


Features

of mux using Tristate Buffers

Selection

lines can be one hot encoded or binary

Number

of inputs can be any number depending upon


number of internal Tristate buffer (5:1 mux, 9:1 mux etc)

CLB

become available for placing other relevant logic.

The

size of multiplexer will have minimal affect on area and


delay.

2015 Centre for Development of Advanced Computing

Synthesis

Multiplexer using Tristate


Issues

with using Tristate Buffers

At

a time if more than one Tristate is ON, the outcome


would be a multiple driver which will increase power
consumption and reduce chip reliability.

Since

use of Tristate is specific to some device family, porting


design becomes difficult.

Designer

has to take care that bus is not kept floating, a week


keeper has to be added.

2015 Centre for Development of Advanced Computing

Synthesis

Pipelining
Pipelining

is approach in which long combinational paths are


broken by placing flip-flops in between.

Pipelining

design tends to increase operating frequency on the


expense of more logic resource (area requirements)

module addition ( input clk, a, b, c, d, e, f, g, h, output reg result);


always @ (posedge clk)
begin
result<= ((a | b) & (c ^ d)) ^ ((e ~^ f) | (g & h));
end
endmodule
2015 Centre for Development of Advanced Computing

Without
Pipelining

Synthesis

Pipelining
6 ns
A
B

AND

C
D

Synthesis Results

OR

XOR

5 ns

XOR

7 ns

E
F

7 ns

XNOR

CLK

OR
G
H

AND
5 ns

F/F

6 ns

Operating Frequency= 1/(20 ns)

2015 Centre for Development of Advanced Computing

Synthesis

Pipelining
module addition ( input clk, a, b, c, d, e, f, g, h, output reg result);
always @ (posedge clk)
begin
temp1<= a | b;
temp2<=c ^ d;
temp3<=e ~^ f;
temp4<=g & h;
temp5<= temp1 & temp2;
temp6<= temp3 | temp4;
result<= temp5^ temp6;
end
endmodule

With Pipelining

2015 Centre for Development of Advanced Computing

Synthesis

Pipelining
6 ns
A
B

OR

AND

C
D

Synthesis Results

F/F

XOR

F/F

F/F

5 ns

XOR

7 ns

E
F

XNOR

7 ns

F/F
OR

G
H

AND
5 ns

F/F

F/F

F/F

CLK

6 ns

Operating Frequency= 1/(7 ns)

2015 Centre for Development of Advanced Computing

Synthesis

Reduce Complex Operation


Arithmetic

and relation operators requires more logic


resources and hence are expensive.

The

approach should be to avoid them in order to achieve


better resource utilization.
Example 1
reg [15:0] count;
Number of 4 input LUTS 60 21504
always @ (posedge clk)
begin
count = count + 1;
If (count> 16'b1010_1001_1011_0110) //Relation Operator
count = 0;
end

2015 Centre for Development of Advanced Computing

Synthesis

Reduce Complex Operation


Example 2
reg [15:0] count;
Number of 4 input LUTS 54 21504
always @ (posedge clk)
begin
count = count + 1;
If (count==16'b1010_1001_1011_0111) //Relation Operator
count = 0;
end

== requires less logic in comparison to >.


Tool does not understand that count will not go more than
1010_1001_1011_0110
2015 Centre for Development of Advanced Computing

Synthesis

Constant Propagation
Constants

are pushed into logic to reduce area requirements


Cool Runner2 CPLDs

Example 1
1

MUX

B
C

2015 Centre for Development of Advanced Computing

Synthesis

Constant Propagation
1

Z
B

2015 Centre for Development of Advanced Computing

Synthesis

Constant Propagation

2015 Centre for Development of Advanced Computing

Synthesis

Constant Propagation
Example 2 : Determine optimized outcome of this mux
0
A

MUX

2015 Centre for Development of Advanced Computing

Synthesis

Constant Propagation
A
Z

X
Y

Instead

of using optimized mux from library, logic is created out


of discrete gates.

To

prevent this we use no boundary optimization attribute so


that timings are not affected.

2015 Centre for Development of Advanced Computing

Synthesis

Register Duplication
Register

Duplication is an approach to reduce fan-out and


improve timing.

If

register duplication is disable and register is driving large


number of loads, this may affect timing requirements of a system.

To

achieve better timing, it is recommended to enable register


duplication which will distribute load and better timing can be
obtained.

2015 Centre for Development of Advanced Computing

Synthesis

Register Duplication
Example 1

Register Duplication disabled

module reg_duplicate1(input clk, din, output [63:0] x);

reg q;
assign x={64{q}};
always @ (posedge clk)
begin
q<=din;
end
endmodule
2015 Centre for Development of Advanced Computing

Synthesis

Register Duplication
Synthesis Results
din
clk

F/F

X[0]
X[1]

Reg A

Reg A drives all 64 loads

X[2]
X[3]
X[4]

Imagine amount of delay


between x[0] and x[63]

X[5]

X[62]

Because of this delay


timing will suffer

X[63]
2015 Centre for Development of Advanced Computing

Synthesis

Register Duplication
Example 2

Register Duplication enabled

module reg_duplicate2(input clk, din, output [63:0] x);

reg q;
assign x={64{q}};
always @ (posedge clk)
begin
q<=din;
end
endmodule
2015 Centre for Development of Advanced Computing

Synthesis

Register Duplication
Synthesis Result 1
din
clk

F/F

Reg A

X[0]
X[1]
X[2]

X[31]
din
clk

F/F

X[32]

Loads are distributed between


Reg A and Reg B
Delay is comparatively less
as compared to Example 1

X[33]

Reg B
X[62]

This is ideal case of Register


duplication

X[63]
2015 Centre for Development of Advanced Computing

Synthesis

Register Duplication
Synthesis Result 2
din
clk
din

F/F

In FPGA number of Flip-Flop


are higher
F/F

clk
din
clk
din
clk

X[0]

F/F

F/F

X[15]

X[32]

Tool will try to place F/F


at each and every load

X[63]

In this case timing results


achieved would be best

2015 Centre for Development of Advanced Computing

Synthesis

Operator in If Statement
If

any signal present in conditional expression is late arriving


signal then it should be moved closer to output.
Example 1
process (A, B, C, D)
begin
if (A + B < 24) then
Z<=C;
else
Z<=D;
end if;
end process;

always @ (*)
begin
If (A + B < 24)
Z<=C;
else
Z<=D;
end

2015 Centre for Development of Advanced Computing

Synthesis

Operator in If Statement
Synthesis Result
C
D

MUX

ADD

COMP

24

2015 Centre for Development of Advanced Computing

Synthesis

Operator in If Statement
Assume

that A is late arriving signal, in that case perform


calculations and then compare with A
Example 2
process (A, B, C, D)
begin
if (A < 24 - B) then
Z<=C;
else
Z<=D;
end if;
end process;

always @ (*)
begin
If (A < 24 - B)
Z<=C;
else
Z<=D;
end

2015 Centre for Development of Advanced Computing

Synthesis

Operator in If Statement
Synthesis Result
C
D

MUX

24

SUB

COMP

2015 Centre for Development of Advanced Computing

Synthesis

Fan-out Control
Larger

the Fan-Out greater is the delay.

If

Fan-Out is beyond limit tool will add buffers which will further
reduce the speed.

The

code has to be modified in such a way that load reduces on


one signal which will result in better timing.

2015 Centre for Development of Advanced Computing

Synthesis

Fan-out Control
Example I
(* max_fanout = "2" *) reg a;
assign e=a ? 1 : 0;
always @ (posedge clk)
a= b & c;

always @ (*)
begin
d= a ^ b;
f= (a ^ c) | (a & b);
end
2015 Centre for Development of Advanced Computing

Synthesis

Fan-out Control
Synthesis Result

Register Duplication: No

2015 Centre for Development of Advanced Computing

Spartan 3

Synthesis

Fan-out Control
Example 2

(* max_fanout = "2" *) reg a, a1;


assign e=a1 ? 1 : 0;
always @ (posedge clk)
begin
a = b & c;
a1= b & c;
end
always @ (*)
begin
d= a ^ b;
f= (a ^ c) | (a1 & b);
end

2015 Centre for Development of Advanced Computing

Synthesis

Fan-out Control
Synthesis Result

Register Duplication: No

Spartan 3

Equivalent register removal: No


2015 Centre for Development of Advanced Computing

Synthesis

Data path Duplication


module path_dup (
input [7:0] a, b,
input [15:0] address,
input control,
output [15:0] out);

parameter [7:0] base1=8h35;


parameter [7:0] base2=8h47;
reg [7:0] c;
reg [15:0] value;

always @ (*)
begin
If (control==1)
temp=a;
else
temp=b;
c= base1-temp;
value= address {8b0, c};
out= value + base2;
end

2015 Centre for Development of Advanced Computing

Synthesis

Data path Duplication


Address
Base1
SUB

MUX

Temp

SUB

Value

Base 2

ADD

Out

Control

Now assume that control is late arriving signal. In order to achieve


timing requirements control has to be moved closer to out.
2015 Centre for Development of Advanced Computing

Synthesis

Data path Duplication


module path_dup (
input [7:0] a, b,
input [15:0] address,
input control,
output out);
parameter [7:0] base1=8h35;
parameter [7:0] base2=8h47;
reg [7:0] c1, c2;
reg [15:0] value1,value2,out1,out2;

always @ (*)
begin
c1= base1- a;
c2= base1- b;
value1= address {8b0, c1};
value2= address {8b0, c2};
out1= value1 + base2;
out2= valu2 + base2;
If (control==1)
out=out1;
else
out=out2;
end

2015 Centre for Development of Advanced Computing

Synthesis

Data path Duplication


Address
Base1
SUB

C1

SUB

Value1

Base 2

ADD

Out1

Address
Base1
SUB
B

C2

Out

MUX
SUB

Value2

Base 2

ADD

Out2

2015 Centre for Development of Advanced Computing

Control

Synthesis

Physical Synthesis and Optimization


with ISE 9.1i

Logic Duplication
If

LUT or F/F drive multiple load and if one or more logic driven
by it is placed far from LUT or F/F.

In

such a case tool duplicates the logic and places it near to the
group of loads and hence helps in achieving timing requirements.

This

phenomenon is called as logic duplication.

2015 Centre for Development of Advanced Computing

Synthesis

Logic Duplication
CLB 1

L1

L3-> L2-> L4 is a critical path

critical path

L2

CLB 2
D

L3

CLB 3

CLB 4

F/F

2015 Centre for Development of Advanced Computing

L4

Synthesis

Logic Duplication
A

L1

CLB 1

L2 is duplicated and placed in CLB4


C

L2

CLB 2
D

L3

CLB 3

L2

CLB 4
Y

F/F

2015 Centre for Development of Advanced Computing

L4

Synthesis

Logic Recombination
If

a critical path travels through multiple LUT.

In

such a case logic can be reassembled using few CLBs(Slices) by


combining LUTs and Mux in more effective way .

This

phenomenon is called as logic Recombination.

2015 Centre for Development of Advanced Computing

Synthesis

Logic Recombination
A
L1
E

L2

CLB 1

CLB 5

CLB 2
D

L3

CLB 3
B

F/F

CLB 4
Q

L4

2015 Centre for Development of Advanced Computing

Synthesis

Logic Recombination
CLB 1

L4

CLB 5
Y

E
L2

L1

CLB 2
D

L3

CLB 3
B

2015 Centre for Development of Advanced Computing

F/F

Synthesis

Basic Element Switching


If

a function is implemented using Mux and LUT present inside a


slice.

In

such a case tool will rearrange function to achieve better


timing results.

This

phenomenon is called as Basic Element Switching.

2015 Centre for Development of Advanced Computing

Synthesis

Basic Element Switching


Assume that L3-> L1-> Y is a critical path
CLB 1
E
L2

L4

L3

L1

CLB 2
D

CLB 5

CLB 3
B

2015 Centre for Development of Advanced Computing

F/F

Synthesis

Basic Element Switching


CLB 1
E
L2

L4

L3

L1

CLB 2
D

CLB 5

CLB 3
B

2015 Centre for Development of Advanced Computing

F/F

Synthesis

Pin Swapping
Each

input pin of LUT has different delay.

Tool(Map)

has ability to swap LUT pin so that critical paths uses


LUT pins that offer faster speed.

This

phenomenon is called as Pin Swapping.

2015 Centre for Development of Advanced Computing

Synthesis

Pin Swapping
Assume that pin 2 and pin 1 have minimum delay
CLB 1
E
L2

L4

L3

2
1 L1
0

CLB 2
D

CLB 5

CLB 3
B

2015 Centre for Development of Advanced Computing

F/F

Synthesis

Pin Swapping

CLB 1
E
L2

L4

L3

2
1 L1
0

CLB 2
D

CLB 5

CLB 3
B

2015 Centre for Development of Advanced Computing

F/F

Synthesis

You might also like