Syn These
Syn These
Acknowledgment: Franois Boyer (boyerf@iro.umontreal.ca) had a major contribution in the development of the labs contents.
VHDL Design Flow 1 General Design Flow 1 Top-down design 2 Description paradigms and abstraction levels 3 Description paradigms and abstraction levels (contd) 4 Data Flow Descriptions 5 Control Oriented Descriptions 6 Behavioral Descriptions 7 Behavioral Synthesis (input) 8 Scheduling 9 Allocation 10 Design validation 11 Simulation and verification 12 RTL and behavioral design 13 VHDL Synthesizable Subset 14 VHDL Synthesizable Subset (contd) 15 VHDL Synthesizable Subset (contd) 16 Special attributes 17 Main Features of Behavioral Synthesis 18 Main Features of Behavioral Synthesis (contd) 19 RTL Descriptions 20 Scheduling and allocation illustration 21 Behavioral Compiler Design Flow 22 Steps of the BC Design Flow 23 References 24
EMA1997
- 1 of 8
Design Flow Example 1 VHDL code 2 Main loop after elaboration 4 BEHAVIORAL COMPILER 1 Objectives 2 Design flow 3 Inputs 4 Processing steps 5 Processing steps (contd) 6 Processing steps (contd) 7 BC internals Control-dataflow graph 8 CDFG 9 CDFG nodes 10 Chaining, multicycling, and pipelining 11 Chaining, multicycling, and pipelining (Illustration) 12 CDFG edges 13 Speculative execution 14 Templates 15 Scheduling 16 Scheduling (contd) 17 Scheduling (contd) 18 Allocation 19 Allocation (contd) 20 Allocation criteria 21 Netlisting 22 Control FSM 23 States and csteps 24 BC constraints on loops 25
EMA1997
- 2 of 8
Invoking the scheduler 26 HDL descriptions and semantics 1 Objectives 2 Pre-synthesis model 3 The design 4 The design (contd) 5 Behavioral processes 6 Behavioral processes (contd) 7 Clock and Reset 8 Synchronous resets 9 Synchronous resets (contd) 10 Asynchronous resets 11 I/O Operations 12 I/O Operations 13 I/O Operations (contd) 14 Flow of Control 15 Fixed bound FOR loops 16 General loops 17 Pipelined loops 18 Pipelined loops 19 Pipelined loops and Fixed I/O mode 20 Other I/O modes 21 Memory inference 22 Memory code 23 Memory timing 24 Memory timing (contd) 25 Other memory considerations 26 Synthetic components 27 DesignWare developer 28
EMA1997
- 3 of 8
Preserved functions 29 Pipelined components 30 I/O modes 1 I/O modes 2 I/O modes (contd) 3 Cycle-Fixed Mode 4 Cycle-Fixed Mode (Test bench) 5 Fixed Mode rules (Straight line code) 6 Fixed Mode rules (Loops) 7 Loops in fixed mode 8 Nested loops and FM 9 Successive loops and FM 10 Complex loop conditions 11 Superstate-Fixed Mode 12 Superstate-Fixed Mode (Implications) 13 Superstate Rules (continuing superstate) 14 Superstate Rules (separating write orders) 15 Superstate Rules (Conditional superstate) 16 Superstate Rules (Escaping from the loop) 17 Free-Floating Mode 18 Explicit Directives and Constraints 1 Labeling
EMA1997
- 4 of 8
(Default naming) 2 Labeling (user naming) 3 Labeling (improved naming) 4 Scheduling Constraints 5 Scheduling Constraints (contd) 6 Shell Variables 7 Shell Variables (contd) 8 Shell Commands 9 RTL Design Methodology 1 RTL Design flow 2 RTL Design flow 3 Design refinement 4 HDL FF Code 5 HDL latch Code 6 HDL AND Code 7 MUX inference 8 MUX modeling 9 Synthesized gate-level netlist simulation 10 Netlist simulation (contd) 11 Simulation of commercial ASICs 12 Design for Testability 13 Design Re-use 14 Designing with DW Components 15 FPGA Synthesis 16 Links to layout 17 DC and DA environments 18 DC and DA environments
EMA1997
- 5 of 8
(contd) 19 Target, Link, and Symbol Libraries 20 Libraries generation 21 VHDL RTL SEMANTICS 1 Types, signals and variables 2 Buffer mode modeling 3 STD_LOGIC 4 Arithmetic 5 Unwanted latches 6 Asynchronous reset 7 Synchronous reset 8 VHDL specifics 9 VHDL specifics (contd) 10 Finite state machines 11 State encoding 12 HDL description of a state machine 13 Recommended style 14 Enumerated types and encoding 15 General description of FSM 16 Guidelines for FSM coding 17 fail-safe behavior 18 Memories 19 Memory behavior 20 Barrel shifter 21 Multi-bit register 22 Methodology for RTL synthesis 1 Objectives 2 Synthesis constraints 3 Design rule constraints 4
EMA1997
- 6 of 8
DRC 5 Related commands 6 Optimization constraints 7 Cost functions 8 Clock specification 9 Timing reports 10 Design after read 11 VHDL after READ 12 Basic Sequential Element 14 After compilation to lsi_10k; 16 Reports 17 Set_dont_touch 22 Flattening 23 Structuring 24 Grouping and using 25 Characterization 26 Guidelines 27 Guidelines (contd) 28 Guidelines (contd) 29 Finite State Machines 1 Extracting FSMs 2 Coding FSMs in VHDL 6 VHDL Design Flow LAB 1 1 Lab1 VHDL code 5 LAB 2 7 VHDL code lab 2 9 LAB 3 11 VHDL code LAB 3 13 LAB 4 15
EMA1997
- 7 of 8
VHDL code LAB 4 18 LAB 5 21 LAB 5 VHDL code 25 Protocol case study 28 29 LAB 6 33 LAB 6 VHDL CODE 35
EMA1997
- 8 of 8
Top-down design
+: Rough outlines explored at the highest possible level +: Fine-grained optimization at lower levels +: Wider variety of design can be explored at the higher levels
Functionality partitioned into blocks and processes Blocks can be mapped to software or hardware
EMA1997
I - 2 of 24
Control oriented
Emphasis on states and transitions Ex: Protocol descriptions Graphical or languages or both: SDL FIFOs High level synchronization mechanisms Non-determinism Output can be sent either to RTL synth. or behav. synthesis
Depending on states corresponding to circuit states or not
EMA1997
I - 3 of 24
RTL
Language or graphical Hierarchy Finer optimizations Technology independent
Gate
EMA1997
I - 4 of 24
u b
* a
+ * delay
x k = ax k 1 + bu k
Synthesis at either RTL or behavioral level
EMA1997
I - 5 of 24
EMA1997
I - 6 of 24
Behavioral Descriptions
Backend to Dataflow or Control oriented descriptions General purpose Language based: VHDL, Verilog, C, ISP, Pascal, etc. HDL advantages:
Standardized Simulatable Readable interchange formats
EMA1997
I - 7 of 24
Clock edges may be added during behavioral synthesis If many process: each process scheduled independently Mixed descriptions allowed: glue logic, RTL processes, behav. processes
EMA1997
I - 8 of 24
Scheduling
Input= process Output= FSM + datapath Operations assigned to states User Responsibility in RTL synthesis States are part of an FSM Additional states if allowed by the user States = actual machine states Transitions correspond to machines clock edge Machine clock (10 Mhz) may be much faster than the sample clock (50KHz) In RTL and behavioral machine clock considered In Data Stream sample clock is considered
EMA1997
I - 9 of 24
Allocation
Operations assigned to functional hardware Data values assigned to storage elements
Optimization algorithm based on variables lifetime
EMA1997
I - 10 of 24
Design validation
Expectation formal verification Input HDL Simulation
formal verification
Synthesis
Simulation
formal verification
Output HDL
Simulation
Expectation
EMA1997
I - 11 of 24
Formal verification
Test mathematical properties Proof equality of two designs
Equality of boolean expressions Bissimultaion in process algebra (CCS)
Not yet the main stream Strict methodology for specification Alleviates simulation limitations
EMA1997
I - 12 of 24
Synthesis
requires the use of a subset of the HDL
EMA1997
I - 13 of 24
EMA1997
I - 14 of 24
Ignored
Access and file type Aliases ; Assertions Physical types Floating point,
EMA1997
I - 15 of 24
EMA1997
I - 16 of 24
Special attributes
ARRIVAL, FALL_ARRIVAL, RISE_ARRIVAL DRIVE, RISE_DRIVE, FALL_DRIVE LOGIC_ONE, LOGIC_ZERO EQUAL, OPPOSITE DONT_TOUCH_NETWORK LOAD DONT_TOUCH MAX_AREA ENUM_ENCODING UNCONNECTED HOLD_CHECK, SETUP_CHECK MAX_TRANSITION, MAX_DELAY, MIN_DELAY, MIN_RISE_DELAY, MIN_FALL_DELAY
EMA1997
I - 17 of 24
EMA1997
I - 18 of 24
Easy changes
Number of states in a pipeline by changing a single constraint May result in a change in the control FSM
Further steps
Logic optimization Test insertion Retiming
EMA1997
I - 19 of 24
RTL Descriptions
RTL descriptions 3 to 5 times longer than behavioral descriptions
More development time to get a good model More errors Less readable
Management of next state transition by the user The user has to manage most of the allocation of registers More difficult to deal with
Conditions, pipelining, multiple cycle operations Memory and register Reads an Writes Loops boundaries subprograms
EMA1997
I - 20 of 24
a *
MUX MUX
EMA1997
I - 21 of 24
EMA1997
I - 22 of 24
Explicit constraints
I/O operations Target technology Clock cycle
Early timing analysis (allows chaining of operations) Scheduling Allocation Reports on all the previous steps If not satisfied reiterate by changing constraints otherwise proceed with logic synthesis
EMA1997
I - 23 of 24
References
David W. Knapp, Behavioral Synthesis, Digital System Design Using the Synopsys behavioral Compiler, Prentice Hall PTR, 231 pages, 1996.
Covers behavioral synthesis and different case studies
Pran Kurup and Taher Abbasi, Logic Synthesis Using Synopsys, Second Edition, Kluwer, 322 pages, 1997.
Covers logic synthesis Original presentation Interesting scenarios
Giovanni De Micheli, Synthesis and Optimization of Digital Circuits, McGraw-Hill, 579 pages, 1994.
Best book on theory and algorithms for HLS Pipelines not covered Departs from Synopsys view of I/O
EMA1997
I - 24 of 24
VHDL code
package types is subtype small_int is integer range 0 to 255; end types; library ieee; use ieee.std_logic_1164.all;
use work.types.all; entity ex_bhv is port(clk,stop: in std_logic; inport,alpha,beta: in small_int; outport: out small_int); end ex_bhv; library ieee; use ieee.std_logic_1164.all; architecture algo of ex_bhv is begin process
EMA1997
II - 2 of 13
variable a,b,u,x:small_int ; begin Reset_loop: loop -- Reset tail outport <= 0; u:= 0; x:=0; a:= alpha; b:= beta; wait until clkevent and clk=1; if stop =1 then exit reset_loop; end if; main_loop: loop -- normal mode behavior u := inport; x:= a*x + b*u; outport <= x; wait until clkevent and clk=1; if stop =1 then exit reset_loop; end if; end loop main_loop; end loop Reset_loop; end process; end algo;
EMA1997
II - 3 of 13
EMA1997
II - 4 of 13
bc_analyzer> bc_time_design
Cumulative delay starting at inport_33: inport_33 = 0.000000 mul_34_2 = 16.996946 add_34 = 19.182245 outport_35 = 19.182245 Cumulative delay starting at mul_34_2: mul_34_2 = 17.055845 add_34 = 19.241144 outport_35 = 19.241144 Cumulative delay starting at mul_34: mul_34 = 17.055845 add_34 = 19.241144 outport_35 = 19.241144 Cumulative delay starting at outport_35: outport_35 = 0.000000 Cumulative delay starting at add_34: add_34 = 13.757200 outport_35 = 13.757200 Cumulative delay starting at beta_28: beta_28 = 0.000000 Cumulative delay starting at alpha_27: alpha_27 = 0.000000
EMA1997
II - 5 of 13
EMA1997
II - 6 of 13
D D D W W W 0 0 0 2 2 1 _ _ p p p _ m m p o o o a u u o r r r d l l r t t t d t t t -------+------+------+-----+-----+------+-------+--------+----cycle | loop | beta | p0 | p1 | r29 | r47 | r41 | p2 ---------------------------------------------------------------0 |..L3..|.R28..|.R27.|.....|......|.......|........|.W25. |..L0..|......|.....|.....|......|.......|........|..... 1 |..L6..|......|.....|.R33.|......|.o1150.|.o1150a.|..... 2 |......|......|.....|.....|.o841.|.......|........|.W35. 3 |..L8..|......|.....|.....|......|.......|........|..... |..L7..|......|.....|.....|......|.......|........|..... |..L5..|......|.....|.....|......|.......|........|..... |..L4..|......|.....|.....|......|.......|........|..... |..L2..|......|.....|.....|......|.......|........|..... |..L1..|......|.....|.....|......|.......|........|.....
EMA1997
II - 7 of 13
Operation name abbreviations =============================== L0..........loop boundaries process_20_design_loop_begin L1..........loop boundaries process_20_design_loop_end L2..........loop boundaries process_20_design_loop_cont L3..........loop boundaries Reset_loop/Reset_loop_design_loop_begin L4..........loop boundaries Reset_loop/Reset_loop_design_loop_end L5..........loop boundaries Reset_loop/Reset_loop_design_loop_cont L6..........loop boundaries Reset_loop/main_loop/main_loop_design_loop_begin L7..........loop boundaries Reset_loop/main_loop/main_loop_design_loop_end L8..........loop boundaries Reset_loop/main_loop/main_loop_design_loop_cont R27.........8-bit read Reset_loop/alpha_27 R28.........8-bit read Reset_loop/beta_28 R33.........8-bit read Reset_loop/main_loop/inport_33 W25.........8-bit write Reset_loop/outport_25 W35.........8-bit write Reset_loop/main_loop/outport_35 o841........(8_8->8)-bit ADD_UNS_OP Reset_loop/main_loop/add_34 o1150.......(8_8->16)-bit MULT_UNS_OP Reset_loop/main_loop/mul_34_2 o1150a......(8_8->16)-bit MULT_UNS_OP Reset_loop/main_loop/mul_34
EMA1997
II - 8 of 13
EMA1997
II - 9 of 13
FSM
s_0_0 read(beta, alpha) write (0, outport) s_1_1 read(u, inport), t1:=b*u, t2:=a*x s_2_2 read(u, inport), t1:=b*u, t2:=a*x s_2_3 t2:= t1+t2, write (t2, outport)
EMA1997
II - 10 of 13
EMA1997
II - 11 of 13
present next state input state actions -----------------------------------------------------------------s_0_0 s_1_1 a_0: Reset_loop/beta_28 (read) a_1: Reset_loop/alpha_27 (read) a_4: Reset_loop/outport_25 (write) s_1_1 s_2_2 a_2: Reset_loop/main_loop/inport_33 (read) a_10: Reset_loop/main_loop/mul_34 (operation) s_2_2 s_2_3 a_7: Reset_loop/main_loop/mul_34_2 (operation) s_2_3 s_2_4 a_3: Reset_loop/main_loop/outport_35 (write) a_19: Reset_loop/main_loop/add_34 (operation) s_2_4 s_2_2 a_2: Reset_loop/main_loop/inport_33 (read) a_10: Reset_loop/main_loop/mul_34 (operation) ------------------------------------------------------------------
EMA1997
II - 12 of 13
FSM 2
s_0_0 read(beta, alpha), write (0, outport) s_1_1 read(u, inport), t1:=a*x s_2_2 t2:=b*u
EMA1997
II - 13 of 13
Objectives
BC description
Inputs and outputs, capabilities and internal structure Provides a conceptual framework Understand error messages, the processing of the design Design good inputs to the BC
Interfaces
BC is a collection of function embedded in a program: bc_shell Textual Graphic interface: Design Analyzer (DA) tool
EMA1997
BEHAVIORAL COMPILER
III - 2 of 26
Design flow
VHDL analysis and elaboration Explicit constraints scheduling allocation netlisting RTL .db form logic optimization
EMA1997
BEHAVIORAL COMPILER
III - 3 of 26
Inputs
Input mechanisms
HDL text bc_shell command language Pragma directives: comments embedded in the HDL text
HDLs
VHDL and Verilog One or more processes + logic external to processes BC does not process any interaction between processes Considers a process at a time without any ordering between processes
EMA1997
BEHAVIORAL COMPILER
III - 4 of 26
Processing steps
bc_shell> analyze -f vhdl mydesign.vhd
Elaborate Command
bc_shell> elaborate -s mydesign
The s flag: elaborate for scheduling. Can be overridden by the attribute rtl attached to the process
User constraints
Fixing a clock period and specifying the clock signal is mandatory
bc_shell> create_clock clk -period 9
EMA1997
BEHAVIORAL COMPILER
III - 5 of 26
BC timing analysis is accurate: bit_level instead of lumped timing models Report on combinational chains
EMA1997
BEHAVIORAL COMPILER
III - 6 of 26
The user may decide to change the operation, the implementation or the clock cycle
Scheduling
Operations mapped to control steps Non-concurrent operations may share the same multiplexed hardware
EMA1997
BEHAVIORAL COMPILER
III - 7 of 26
split * join m * m 4 /
+ a x aut
i+1
EMA1997
BEHAVIORAL COMPILER
III - 8 of 26
CDFG
CDFG
Abstract representation of the circuit behavior Without bias toward any schedule
Terminology
+, -, *, / belonging to cstep i are concurrent BC (not DC) will allow - moved to next cstep a dual unit add/sub can be shared CDFG edges represent precedence Latency: total number of csteps
EMA1997
BEHAVIORAL COMPILER
III - 9 of 26
CDFG nodes
Data
Arith/logic operations, some function calls Synthetic nodes share hardware, random logic not Patch boxes: bit and field selection, constant sources Memory R/W: memory accesses IO R/W: R/W to ports or signals
Conditional
split, join
Hierarchical
loops and function calls
EMA1997
BEHAVIORAL COMPILER
III - 10 of 26
Multicycling
Controls and muxes should be registered If conditional, FSM should commit at cycle i-1 to stabilize registers extra cstep Forcing unicycling regardless of timing analysis, use with caution:
bc_shell> bc_enable_multi_cycle = false
Pipelining
f(x) = g(h(x)): a register isolates h for g Pipelining either automatic or by implementation directives More expensive than a k-cycle operation but k times faster Multiplication and memory operations are prime candidates
EMA1997
BEHAVIORAL COMPILER
III - 11 of 26
< split + + h + * g
f=g(h)
EMA1997
BEHAVIORAL COMPILER
III - 12 of 26
CDFG edges
Data edges
Represent values
Precedence edges
Represent order and control t and f of a split node Constraints
bc_shell> set_min_cycles 3 -from sub1 -to add3
EMA1997
BEHAVIORAL COMPILER
III - 13 of 26
Speculative execution
Pre-computed result stored into a register discarded if branch not taken Default: Turned off search space and execution time
bc_shell> bc_enable_speculative_execution
< split +
EMA1997
BEHAVIORAL COMPILER
III - 14 of 26
Templates
Precedence and data arcs cannot express maximum allowable duration Prescheduled sub-design has to be preserved
Templates
Collection of operations allowed to move only as a group Rigid timing relationship between its elements Slots contain either place holders and/or other nodes Notice them when impossible schedule
EMA1997
BEHAVIORAL COMPILER
III - 15 of 26
Scheduling
Objective
Minimize hardware cost within user timing constraints Ex: 2 additions on a single adder if occurring in Minimize cost of registers Lower(nb registers)= min nb of bits crossing cstep boundaries
Algorithm
while all operations not scheduled Choose the most important unscheduled operation OP Assign OP to the most cost effective step Mark OP as scheduled
EMA1997
BEHAVIORAL COMPILER
III - 16 of 26
Scheduling (contd)
Criteria for selecting operations op
Ready highest implementation cost mobility
EMA1997
BEHAVIORAL COMPILER
III - 17 of 26
Scheduling (contd)
Additional complexity
Conditional Loops Pipelining Memory operations
BC schedules bottom up
Innermost loops and functions calls first Inline each completed level Inlined loops encapsulated in templates Inconsistencies may appear at higher levels due to templates
EMA1997
BEHAVIORAL COMPILER
III - 18 of 26
x(0)= x; y(0)= y; y(0)= u y + 3xy + 3y = 0 ------------------------------------------eqdiff { lire (x, y, u, dx, a) rpter { x1 = x + dx; u1 = u (3 * x * u * dx) (3 * y * dx); y1 = y + u * dx; c = x1 > a; x = x1; u = u1 ; y = y1; } jusqu (c); crire (y); }
Allocation
Operation should be mapped on particular hardware resources The number of resources is supposed given from the scheduling step Unit selection and mapping affects both speed and cost
Unallocated = operations & values While unallocated choose U if not( free(R, time(U)) & Impl(R,U)) then add new resource R assign(U, best (R)) mark U allocated
EMA1997
BEHAVIORAL COMPILER
III - 19 of 26
Allocation (contd)
Avoid false path otherwise logic synthesis will have hard time (a, b, c), (d, e) chained operations Diff. csteps a b c d e x y z w
False path
EMA1997
BEHAVIORAL COMPILER
III - 20 of 26
Allocation criteria
Cost
Allocate the most expensive operations first
Critical path
Operations and operands affecting the clock cycle first
Interconnect
Cluster operations and operands Minimize interconnects Avoid false paths
> set_common_resource op1 op2 op3 -mincount 2
EMA1997
BEHAVIORAL COMPILER
III - 21 of 26
Netlisting
Output of BC goes to logic synthesis
Random logic required by the user is instantiated Register, operators, memories are instantiated according to the allocation step MUXes, nets, connectivity hardware are constructed to connect the datapath Whole design connected to signals and ports Status and control points recorded for later hookup to control FSM
EMA1997
BEHAVIORAL COMPILER
III - 22 of 26
Control FSM
A state graph is constructed A set of control actions is constructed Each of these drives a control point Actions are annotated on the transitions of the state graph Status points are mapped from the scheduled CDFG Netlist augmented with Control Unit (CU) Inputs to CU are status signals Outputs are connected to the control points A State Table is constructed It will serve as input to the FSM compiler
EMA1997
BEHAVIORAL COMPILER
III - 23 of 26
EMA1997
BEHAVIORAL COMPILER
III - 24 of 26
BC constraints on loops
Loop-end at least one cstep after loop-begin Loop exit at least on cstep after cond. evaluation 1+ clock edges inside a loop (O not allowed)
if cc then L1: while cond loop wait until clkevent and clk=1; wait until clkevent and clk=1; end loop; else L2: while cond loop wait until clkevent and clk=1;
EMA1997
BEHAVIORAL COMPILER
III - 25 of 26
BC outputs
bc_shell> report_schedule -operations -variables bc_shell> write -hierarchy -format vhdl -out mydesign.vhd compile -map_effort medium optimize_registers write -format edif -hierarchy
EMA1997
BEHAVIORAL COMPILER
III - 26 of 26
Objectives
VHDL styles for synthesis Overall structure of models for simulation BC interpretation of constructs Main feature of BC Simulation and comparison of design before and after synthesis Design should be tested thoroughly before synthesis Pre-synthesis simulation faster than post-synth. simulation Test as much as poss. at behav. level Development of good test benches is very important It is also very time-consuming May be more than the development of the model itself
EMA1997
IV - 2 of 30
Pre-synthesis model
Design
response file
Test bench
EMA1997
IV - 3 of 30
The design
Must be represented by
A VHDL entity An associated architecture
library IEEE; use ieee.std_logic_1164.all; use ieee.std_logic_arith.all; entity design is port (clk, reset: in std_logic; ip: in signed (7 downto 0); op: out signed(7 downto 0)); end design;
EMA1997
IV - 4 of 30
architecture behave of design is -- type, signal, component, ... declarations begin -- component instantiations -- concurrent statements -- RTL processes -- behavioral processes end behave;
EMA1997
IV - 5 of 30
Behavioral processes
BC schedules behavioral processes only Random logic outside processes and RTL processes preserved during scheduling Multiple behavioral processes scheduled independently No attempt by BC to maintain synchronicity User must maintain synchronicity by providing strobes, ready signals, etc. Synchronization more difficult when non-cycle-fixed I/O is present Local variables to a process are mapped to registers Optimization based on life time
EMA1997
IV - 6 of 30
EMA1997
IV - 7 of 30
Interpretation
Each clock edge forces the process to await the next active clock edge before proceeding An output write may be forced to fall one cycle after another operation User may insert any number of clock edges in the model Edge polarities cannot be mixed inside the same process Different processes may use different edges, clock nets, and frequencies Sensitivity lists are not allowed in a behavioral process Inside a behavioral process only one signal and one polarity can be the argument of any wait statement
EMA1997
IV - 8 of 30
Synchronous resets
main: process begin reset_loop: loop --reset tail pc := (others => 0); sp:= (others =>1); wait until clkevent and clk=1; if reset =1 then exit reset_loop; end if; main_loop: loop -- normal mode instr := memory(pc); wait until clkevent and clk=1; if reset =1 then exit reset_loop; end if; case (instr) is when 00100000 => ... end loop main_loop; end loop reset_loop; end process main;
EMA1997
IV - 9 of 30
A reset net or port should be provided Unused net is deleted during elaboration add a dummy port or logic which uses the reset net
EMA1997
IV - 10 of 30
Asynchronous resets
Use set_bahavioral_async_reset or If needed for pre-synthesis simulation Readability
wait until (clkevent and clk =1) --synopsys synthesis_off or ( resetevent and reset =1) --synopsys synthesis_on if reset =1 then exit reset_loop; end if;
EMA1997
IV - 11 of 30
I/O Operations
entity design is port (clk, reset: in std_logic; ip: in signed (7 downto 0); op: out signed(7 downto 0)); end design; architecture behave of design is signal sig: signed(7 downto 0); begin P1: process variable v1,v2: signed (7 downto 0); begin wait until clkevent and clk =1; v1 := ip; --read v2:= ip; -- different read wait until clkevent and clk =1; sig <= v1; -- write wait until clkevent and clk =1; op <= v2 + sig -- read and write wait until clkevent and clk =1; end process P1; end behave;
EMA1997
IV - 12 of 30
I/O Operations
I/O R/W inferred from references to architecture signals or entity ports Note different reads in the same cycle Cycle stretched in 2 I/O modes
If one read wanted then re-use v1
EMA1997
IV - 13 of 30
EMA1997
IV - 14 of 30
Flow of Control
Most constructs supported
For, while , infinite loops If-then-elsif-else, case statements Functions, procedures
Next, exit
Associated with a reset, or Affect the immediate enclosing loop
EMA1997
IV - 15 of 30
EMA1997
IV - 16 of 30
General loops
Not unrolled
Infinite loops While loops and loops with dynamic range Loops with explicit conditional exit
EMA1997
IV - 17 of 30
Pipelined loops
loop a:= inputport; wait until clkevent and b:= op1(a); wait until clkevent and c:= op2(b); wait until clkevent and d:= op3(c) wait until clkevent and e:= op4(d); wait until clkevent and outputport <= op5(e); wait until clkevent and end loop;
clk =1; clk =1; clk =1; clk =1; clk =1; clk =1;
l read initiation a op1 interval t op2 read e 0p3 op1 n op4 op2 c op5,w 0p3 read op1 y op4 op2 op5,w 0p3 op4 op5,w
EMA1997
IV - 18 of 30
Pipelined loops
Previous example hypothesis
No chaining possible Operations so diff. they cannot share the same hardware.
Before pipelining
1/6 resource utilization 1/6 throughput
After pipelining
1/2 utilization 1/2 throughput
EMA1997
IV - 19 of 30
EMA1997
IV - 20 of 30
Regardless of I/O mode re-using the outputs cannot be too close to the end of the pipelined loop No exit later than II+1 to avoid explosion of states Rolled loops cannot be nested inside pipelined loops BC cannot determine statically the concurrency between iterations
EMA1997
IV - 21 of 30
Memory inference
Memories specified using arrays Memories consist of words BC schedules accesses and controls ports RAM accesses are synthetic BC makes conservative assumptions about address conflicts An address conflict occurs: two accesses to same mem. one access is a write
M(14) := 5; x := M(14); True conflict M(14) := 5; x := M(13); False conflict
BC does not distinguish between false and true conflicts Override BC deduction:
> ignore_memory_precedences -from op1 to op2
EMA1997
IV - 22 of 30
Memory code
architecture beh of mem_dsg is subtype resource is integer; attribute variables: string; attribute map_to_module: string; type mem_type is array (0 to 15) of signed(7 downto 0); begin behavP: process constant Mem1: resource:=0; --physical memory attribute variables of Mem1: constant is M; attribute map_to_module of Mem1: constant is DW03_ram1_s_d; variable M: mem_type; --logical memory begin ... M(12) := 00101100; -- mem. w. outport <= M(2*x); -- mem. r.
EMA1997
IV - 23 of 30
Memory timing
Normal loops
Iterations are insulated R/W of one iteration strictly precede R/W of next iteration
Pipelined loop
Iterations co-exist inter-iteration conflicts appear These conflicts may be false
> ignore memory_loop_precedences {op1 op2}
EMA1997
IV - 24 of 30
M(i):= M(i+1): f(M(i)) f(M(i+1 f(M(i+2 f(M(i+3 M(i+2): M(i+3): II cannot 1 or 2 due to false memory conflict, it may be 3
EMA1997
IV - 25 of 30
EMA1997
IV - 26 of 30
Synthetic components
Component synthesized on the fly when needed
Ex: adders, multipliers ... Encapsulated in DesignWare libraries Sharable resources during allocation
EMA1997
IV - 27 of 30
DesignWare developer
Function or procedure used in more than one place Is not in the DW lib. Wish the hardware implementation sharable ex: MAC op. for DSP with implementations repeated random logic modules Define a function instead of code
if (cond) then x := d1; else x:=d2; end if;
Then use the map_to_module pragma use DW module instead of inlining the function simplify the FSM by moving parts to Data Path Size of the FSM exponential in number of inputs
EMA1997
IV - 28 of 30
Preserved functions
By default, BC inlines subprograms during elaboration
To prevent inlining:
function fid (...) is -- synopsys preserve_function
EMA1997
IV - 29 of 30
Pipelined components
Comb. logic as a synth. comp. may have excessive delay.
1. Lengthen the clock cycle Bad solution Increases chaining while diminishing sharing 2. Allow multi-cycle operations Latency penalty Registered inputs 3. Pipelining May be obtained by retiming (optimize_registers) Some DW components are pipelined Use DW developer Use a directive
> set_pipeline_stages {op1 op2} -fixed_stages 3
EMA1997
IV - 30 of 30
V. I/O modes
I/O modes
Three I/O modes
three different interpretations of HDL semantics Modes define equivalence between the pre-synthesis and post-synthesis models
Pre and post synthesis designs perform the same operation at the same time on their inputs
Very strict, rules out scheduling
EMA1997
I/O modes
V - 2 of 18
EMA1997
I/O modes
V - 3 of 18
Cycle-Fixed Mode
Any scheduled mode has a fixed counterpart
A timing diagram not achievable in fixed mode Not achievable in any mode Source can talk correctly to its environment Synthesized process will Source should be written allowing BC synthesis Without any clock cycle
EMA1997
I/O modes
V - 4 of 18
reset process
EMA1997
I/O modes
V - 5 of 18
EMA1997
I/O modes
V - 6 of 18
loop_end
EMA1997
I/O modes
V - 7 of 18
free_loop: loop if ready then wait until clkevent and clk=1; exit free_loop; end if; wait until clkevent and clk=1; end loop; dataout <= data;
EMA1997
I/O modes
V - 8 of 18
A: otherwise two condition must be tested in the same cycle B: otherwise one branch of nested loop without a clock edge
EMA1997
I/O modes
V - 9 of 18
EMA1997
I/O modes
V - 10 of 18
Two reads locked to the same cycle Operations are performed: 2 cycles Extra cycle should be taken into account in the subsequent code
EMA1997
I/O modes
V - 11 of 18
Superstate-Fixed Mode
Properties
Preserves the I/O ordering but Not necessarily the number of clock edges between I/O operations Latency of the design may change by user commands without changing the HDL
> pipeline_loop main_loop -latency 16 -initiation 4
A superstate is the interval between 2 source clock edges. BC is allowed to add clock edges to a superstate
Equivalence
Any I/O write will take place in the last cycle of the superstate An I/O read can take place in any cycle of the superstate
EMA1997
I/O modes
V - 12 of 18
EMA1997
I/O modes
V - 13 of 18
while (not ready) loop tmp := inport ; --read -- edge 1 wait until clkevent and clk=1; -- edge 2 wait until clkevent and clk=1; outport <= data; --illegal end loop;
super A
Edges
super B (continuing)
EMA1997
I/O modes
V - 14 of 18
Ex: 1st superstate starts outside of the loop outside write has to migrate inside the loop (contradiction)
EMA1997
I/O modes
V - 15 of 18
EMA1997
I/O modes
V - 16 of 18
EMA1997
I/O modes
V - 17 of 18
Free-Floating Mode
I/O operations are free to float with respect to one another Operations on single port are partially ordered Series of reads can be permuted No ordering between operations on different ports Data precedences and constraints respected Deleting or adding clock edges permitted If two signals are logically bound then express it using manual constraints
EMA1997
I/O modes
V - 18 of 18
EMA1997
VI - 2 of 9
EMA1997
VI - 3 of 9
EMA1997
VI - 4 of 9
Scheduling Constraints
> preschedule p2/res_loop/main/sub_107 4
Forces the named operation into a particular cstep The cstep is relative to the beginning of the enclosing hierarchical context sub_107 will be put in the 5th cstep of loop main
> set_cycles 3 -from op1 -to op2
EMA1997
VI - 5 of 9
chain_operations equivalent to set_cycles 0 dont_chain_operations equivalent to set_min_cycles 1 remove_scheduling_constraints removes all explicit constraints
> set_common_resource op1 op2 op3 -min_count 2
EMA1997
VI - 6 of 9
Shell Variables
> bc_enable_chaining = false
Globally turns off chaining of synthetic operations. Use more specific constraints. true by default bc_enable_multi_cycle: true by default bc_enable_speculative_execution: false by default
EMA1997
VI - 7 of 9
EMA1997
VI - 8 of 9
Shell Commands
set_margin controls the margin
allowed for control and muxing delays when timing the design before scheduling
> register_control -inputs -outputs
Forces registers on inputs and/or outputs of the control FSM May improve the cycle time but May increase latency if conditionals on the critical path
set_stall_pin
Used to stop the design for some external event to occur Equivalent to a gated clock
EMA1997
VI - 9 of 9
EMA1997
VII - 2 of 21
After P&R
Back-annotate real delay values Perform in place optimization to meet routing delays
EMA1997
VII - 3 of 21
Design refinement
Block diagram of ASIC created after step 1 HDL coding of each block Style of coding important for synthesis Knowledge internals
write good synth. code critical path may traverse hierarchy boundaries
Best results when critical path in one block Ensure registered output blocks
avoid complicated timing budgeting
EMA1997
VII - 4 of 21
HDL FF Code
entity comp is port (b, c: in bit; qout: out bit); end comp; architecture FF of comp is begin P1: process begin wait until cevent and c =1; qout <= b; end process P1; end FF;
b c
D clk
Q qout
EMA1997
VII - 5 of 21
EMA1997
VII - 6 of 21
b c
qout
EMA1997
VII - 7 of 21
MUX inference
Often gates are inferred instead of MUXes map_to_entity pragma forces mapping to MUXes or Function calls or Instantiating MUXes from Synopsys generic library (gtech.db) and assigning map_only attribute
a b s 0 MUX 1 Select f
EMA1997
VII - 8 of 21
MUX modeling
entity comp is port (a,b, s: in bit; f: out bit); end comp; architecture mux of comp is begin P1: process (a,b, s) begin case s is when 0 => f<=a; when 1 => f<=b; end case; end process P1; end mux;
EMA1997
VII - 9 of 21
EMA1997
VII - 10 of 21
Logic synthesis
Transform RTL HDL to gates Optimize by selecting the optimal combination of technology library cells
EMA1997
VII - 11 of 21
vdlib.vhd.E ASIC vendor library (vdlib.db) Synopsys Library Compiler liban utility vdlib_components.vhd
vdlib.vhd.E : encrypted, contains simulation models with timing delays vdlib_components.vhd: package, declarations for all the cells of ASIC vendor library If source available (.lib) the user can control the type of the model by setting the dc_shell variable vhdllib_architecture write_lib -f vhdl
EMA1997
VII - 12 of 21
Data
Full Scan combinational ATPG Partial Scan sequential ATPG TC automatically replaces sequential cells by scan cells TC generates test patterns and computes fault coverage (single s-a-0/1 model)
EMA1997
VII - 13 of 21
Design Re-use
Achieves fast turnaround on complex designs DesignWare is a mechanism to build a library for re-usable components Generic GTECH Library
Source read in DC converted to a netlist of GTECH components and inferred DW parts gtech.db contains basic logic gates, flip flops, half adder and a full adder
DW libraries
Standard, ALU, Maths, Sequential, Data Integrity, Control Logic and DSP adders, counters, comparators, decoders Parts are parametrizable, synthesizable, testable, technology independent Parts have simulation models When used, implementation selection, arith. optimization and resource sharing are on
EMA1997
VII - 14 of 21
Decode Logic
DW03_DECODE
Transmitter FSM
Transmitter Block
EMA1997
VII - 15 of 21
FPGA Synthesis
User programmable IC: set of logic blocks that can be connected using routing resources Interconnect: wires of diff. lengths and programmable switches Easy to configure by the user Implement logic circuits at relatively low cost with a fast turnaround Hardware emulation: use programmable hardware as a prototype of an IC design Rapid growth and density of FPGAs need for synthesis tools FPGA Compiler for Synopsys: Map HDL descriptions to logic blocks and provide configuration of switches
EMA1997
VII - 16 of 21
Links to layout
Advent of sub-micron tech. net delays become significant while gate delays decrease wire delays increase due to capacitances Accurate wire loads and physical hierarchy become crucial to synthesis tools Synopsys Floorplan Manager transfers information between back-end tools and DC Formats for transfer: Standard Delay Format (SDF) Physical Data Exchange Format (PDEF) Synopsys set_load script
EMA1997
VII - 17 of 21
DC and DA environments
Design Analyzer (DA): graphical front end of Synopsys environment
Used to view schematics and their critical path
Startup files
DC reads .synopsys_dc.setup when invoked Recommendation: keep .synopsys_dc.setup in current working directory design specific variables specified without affecting other designs
EMA1997
VII - 18 of 21
EMA1997
VII - 19 of 21
Link library
Used when the design is already a netlist or When the source instantiates technology library cells
Symbol libraries
Contain pictorial representation of library cells
> compare_lib <target_library> <symbol_library> Shows any differences between the two libraries
EMA1997
VII - 20 of 21
Libraries generation
Libraries generated from ASIC files (.lib, .slib) files By Synopsys Library Compiler Produce (.db, .sdb) libraries
> read_lib my_lib.lib > write_lib my_lib.db > read_lib my_lib.slib > write_lib my_lib.sdb
EMA1997
VII - 21 of 21
Signals
Need -time Signals used by a process sensitivity list RTL gate simulation
EMA1997
VIII - 2 of 22
EMA1997
VIII - 3 of 22
STD_LOGIC
TYPE std_ulogic IS (
U, X, 0, 1, Z, W, L, H, - );
-- Uninitialized -- Forcing Unknown -- Forcing 0 -- Forcing 1 -- High Impedance -- Weak Unknown -- Weak 0 -- Weak 1 -- Dont care
attribute ENUM_ENCODING of std_ulogic : type is "U D 0 1 Z D 0 1 D"; FUNCTION resolved ( s : std_ulogic_vector ) RETURN std_ulogic; SUBTYPE std_logic IS resolved std_ulogic;
EMA1997
VIII - 4 of 22
Arithmetic
library IEEE; use IEEE.std_logic_1164.all; package std_logic_arith is type UNSIGNED is array (NATURAL range <>) of STD_LOGIC; type SIGNED is array (NATURAL range <>) of STD_LOGIC; subtype SMALL_INT is INTEGER range 0 to 1; function "+"(L: UNSIGNED; R: UNSIGNED) return UNSIGNED; ... function function function function "+"(L: "+"(L: "+"(L: "+"(L: INTEGER; R: UNSIGNED) return UNSIGNED; INTEGER; R: SIGNED) return SIGNED; UNSIGNED; R: UNSIGNED) return STD_LOGIC_VECTOR; SIGNED; R: SIGNED) return STD_LOGIC_VECTOR;
EMA1997
VIII - 5 of 22
Unwanted latches
Ensure
All signals initialized Case and if stat. completely defined
library ieee; use ieee.std_logic_1164.all; entity qst is port (clk: in std_logic; d: in std_logic_vector (1 downto 0); q: out std_logic_vector (1 downto 0)); end qst; architecture unwanted of qst is begin process (clk, d) begin if clk = 1 then q <= d; -- incomplete no else end if; end process; end unwanted
EMA1997
VIII - 6 of 22
Asynchronous reset
entity FF is port (x,clk, rst: in bit; z:out bit); end FF; architecture async of FF is begin process (clk,rst); variable ST: ...; begin if rst = 0 then ST := S0; z <= 0; elsif clkevent and clk =1 then case ST is ... end case; end if; end process; end async;
EMA1997
VIII - 7 of 22
Synchronous reset
entity FF is port (x,clk, rst: in bit; z:out bit); end FF; architecture sync of FF is begin process variable ST: ...; begin wait until clkevent and clk =1; if rst = 0 then ST := S0; z <= 0; else case ST is ... end case; end if; end process; end sync;
EMA1997
VIII - 8 of 22
VHDL specifics
Case insensitive Case statement
Mutually exclusive branches Exhaustive
Sign interpretation
Depends on data types and associated operations TYPE std_logic_vector IS ARRAY ( NATURAL RANGE <> ) OF std_logic; std_logic_signed, _unsigned: packages for operations on std_logic_vector
EMA1997
VIII - 9 of 22
Multiple drivers
std_logic is a resolved data-type
Components
declared configured instantiated
EMA1997
VIII - 10 of 22
Present State
Mealy machine
EMA1997
VIII - 11 of 22
State encoding
Default
n FF : up to 2n states
Gray
EMA1997
VIII - 12 of 22
so x=1/z=1 s3
s1 x=1/z=0 s2
x=1/z=0
EMA1997
VIII - 13 of 22
Recommended style
architecture Rec of ET is signal currentS, nextS: state; attribute state_vector: string; attribute state_vector of Rec: architecture is currentS; begin COMB: process( currentS, X) begin case currentS is when s0 => if x = 0 then z <= 0; nextS <= s0; else z <= 0; nextS <= s1; end if;... end case end process; -- Outputs not registered SYNC: process begin wait until clkevent and clk = 1 currentS <= nextS; end process; end Rec;
EMA1997
VIII - 14 of 22
Explicit encoding
architecture Rec of ET is type state is (s0, s1, s2, s3); attribute enum_encoding : string; attribute enum_encoding of state: type is 000 110 111 101; signal currentS, nextS: state; begin COMB: process( currentS, X) ... SYNC: process ... end Rec;
EMA1997
VIII - 15 of 22
EMA1997
VIII - 16 of 22
EMA1997
VIII - 17 of 22
fail-safe behavior
architecture One of ET type state is (s0, s1, s2); signal st: state begin process begin wait until clkevent and clk =1 if x=0 then z <= 0; else case st is when s0 => st <= s1; z <=0; when s1 => st <= s2; z <=0; when s2 => st <= s3; z <=0; when others => st <= s0; z <=1; end case; end process; end One;
EMA1997
VIII - 18 of 22
Memories
Not synthesized by DC Instantiated as black boxes HDL descr. for simulation
library IEEE; use std_logic_1164.all; use std_logic_unsigned.all; entity ram_vhd is generic (width: natural :=8 depth: natural :=16; addW: natural:=4); port (addr: in std_logic_vector(addW-1 downto 0); datain: in std_logic_vector(width-1 downto 0); dataout: out std_logic_vector(width-1 downto 0); rw,clk: in std_logic); end ram_vhd;
EMA1997
VIII - 19 of 22
Memory behavior
architecture behv of ram_vhd is subtype wtype is std_logic_vector(width-1 downto 0); type mem_type is array(depth-1 downto 0) of wtype; signal memory:mem_type; begin process begin wait until clk=1 and clkevent; if (rw=0) then memory(conv_integer(addr)) <= datain; end if; end process; process(rw,addr) begin if (rw=1) then dataout <= memory(conv_integer(addr)); else dataout <= wtype(others =>Z); end if; end process; end behv;
EMA1997
VIII - 20 of 22
Barrel shifter
library IEEE; use std_logic_1164.all, std_logic_unsigned.all; entity bs_vhd is port (datain: in std_logic_vector(31 downto 0); direct: in std_logic; count: in std_logic_vector(4 downto 0); dataout: out std_logic_vector(31 downto 0)); end bs_vhd; architecture behv of bs_vhd is function b_shift (din: in std_logic_vector(31 downto 0); dir:in std_logic; cnt: in std_logic_vector(4 downto 0) return std_logic_vector is begin if (dir =1) then return std_logic_vector((SHR(unsigned(din),unsigned(cnt)))); else return std_logic_vector((SHL(unsigned(din),unsigned(cnt)))); end if; end b_shift; begin dataout <= b_shift(datain,direct,count); end behv;
EMA1997
VIII - 21 of 22
Multi-bit register
library IEEE; use std_logic_1164.all; entity reg_vhd is generic (width: natural:=8); port (r: in std_logic_vector(width-1 downto 0); clk,ena,rst: in std_logic; data: out std_logic_vector(width-1 downto 0)); end reg_vhd; architecture behv of reg_vhd is signal gclk: std_logic; begin gclk <= clk and ena; process(rst,gclk) begin if (rst = 0) then data <= (others=>0); elsif gclkevent and gclk=1 then data <= r; end if; end process; end behv;
EMA1997
VIII - 22 of 22
Objectives
How to get the best results Commonly used DC commands Methodology to optimize a design General guidelines
EMA1997
IX - 2 of 29
Synthesis constraints
Optimization constraints
Speed set_input_delay set_output_delay max_delay
create_clock
Area
EMA1997
IX - 3 of 29
b a c d
b, c , d
fanout_load ( i ) max_fanout ( a )
EMA1997
IX - 4 of 29
DRC
max_transition
Longest time 0-1, 1-0 Specific to a net / whole design Related to RC time More restrictive (techn. lib, user)
max_capacitance
Direct control on capacitance Can be used with max_transition Violations reported max_fanout ,max_capacitance control buffering maxcapa ( drivingpin )
drivenpins
capa ( i )
EMA1997
IX - 5 of 29
Related commands
set_max_transition <value> <design_name/port_name> set_max_fanout <value> <design_name/port_name> set_max_capacitance <value> <design_name/port_name>
EMA1997
IX - 6 of 29
Optimization constraints
Speed & area constraints by the user Timing >priority area Synch. paths constrained by specifying all clocks
set_max/min_delay to specify point to point asynch. constraints
Commands
create_clock set_input_delay set_output_delay set_driving_cell set_load set_max_area
EMA1997
IX - 7 of 29
Cost functions
Importance
Max delay Min delay Max power Max area
Others
Non-respected setup requir. of a seq. element violation Path group = paths constrained by a same clock Weights attached to path groups Min indep of groups = worst min Max power for ECL only Area optimization performed only if specified Optimization is an iterative process
EMA1997
IX - 8 of 29
Clock specification
Define each clock by create_clock Clock trees must be hand instantiated Use set_dont_touch_network to prevent buffering clock trees DC considers clock delay network ideal, even gated clocks Use set_clock_skew to override ideal behavior Use set_clock_skew -uncertainty to specify an upper limit
EMA1997
IX - 9 of 29
Timing reports
library IEEE; use IEEE.std_logic_1164.all; entity FF2 is port (a,b,clk, rst: in std_logic; d:out std_logic); end FF2; architecture two of FF2 is signal f: std_logic; begin process (clk,rst) begin if rst = 0 then f <= 0; elsif clkevent and clk =1 then f <= a; end if; end process; process (clk,rst) begin if rst = 0 then d <= 0; elsif clkevent and clk =1 then d <= f and b; end if; end process; end two;
EMA1997
IX - 10 of 29
EMA1997
IX - 11 of 29
EMA1997
IX - 12 of 29
synch_clear => Logic0, synch_preset => Logic0, synch_toggle => Logic0, synch_enable => Logic1, next_state =>a_port, clocked_on => clk_port, Q => f, QN => n74); d_reg : SYNOPSYS_BASIC_SEQUENTIAL_ELEMENT generic map ( ac_as_q => 5, ac_as_qn => 5, sc_ss_q => 5 ) port map ( clear => n67, preset => Logic0, enable => Logic0, data_in => LogicX, synch_clear => Logic0, synch_preset => Logic0, synch_toggle => Logic0, synch_enable => Logic1, next_state => d56, clocked_on => clk_port, Q => d_port, QN => n75); LogicX <= 0; end SYN_two; entity SYNOPSYS_BASIC_SEQUENTIAL_ELEMENT is generic ( ac_as_q, ac_as_qn, sc_ss_q : integer ); port( clear, preset, enable, data_in, synch_clear, synch_preset, synch_toggle, synch_enable, next_state, clocked_on : in std_logic; Q, QN : buffer std_logic); end SYNOPSYS_BASIC_SEQUENTIAL_ELEMENT;
EMA1997
IX - 13 of 29
EMA1997
IX - 14 of 29
elsif ( synch_enable = 1 ) then Q <= next_state; QN <= not( next_state ); end if; end if; end process; end RTL;
EMA1997
IX - 15 of 29
architecture SYN_two of FF2 is component AN2 port( A, B: in std_logic; Z: out std_logic); end component; component FD2 port( D, CP, CD : in std_logic; Q, QN : out std_logic); end component; signal f, n79, n80, n81 : std_logic; begin U28 : AN2 port map( A => f, B => b, Z => n79); f_reg : FD2 port map( D => a, CP => clk, CD => rst, Q => f, QN => n80); d_reg : FD2 port map( D => n79, CP => clk, CD => rst, Q => d, QN => n81); end SYN_two; FD2 a D Q f AN2 n79 clk CP d Z FD2 rst b
EMA1997
IX - 16 of 29
Reports
read -f vhdl test.vhd link_library=target_library=lsi_10k.db create_clock clk -period 5 compile -exact_map report_timing -max_paths 5 clk Q(f_reg) Z 2.24 slack 1.42 0.82 1.91 .85 setup
EMA1997
IX - 17 of 29
Startpoint: f_reg (rising edge-triggered flip-flop clocked by clk) Endpoint: d_reg (rising edge-triggered flip-flop clocked by clk) Path Group: clk Path Type: max Point Incr Path ----------------------------------------------------------clock clk (rise edge) 0.00 0.00 clock network delay (ideal) 0.00 0.00 f_reg/CP (FD2) 0.00 0.00 r f_reg/Q (FD2) 1.42 1.42 f U28/Z (AN2) 0.82 2.24 f d_reg/D (FD2) 0.00 2.24 f data arrival time 2.24 clock clk (rise edge) 5.00 5.00 clock network delay (ideal) 0.00 5.00 d_reg/CP (FD2) 0.00 5.00 r library setup time -0.85 4.15 data required time 4.15 ----------------------------------------------------------data required time 4.15 data arrival time -2.24 ----------------------------------------------------------slack (MET) 1.91
EMA1997
IX - 18 of 29
> set_input_delay
3 -clock clk a
Point Incr Path ----------------------------------------------------------clock clk (rise edge) 0.00 0.00 clock network delay (ideal) 0.00 0.00 input external delay 3.00 3.00 r a (in) 0.00 3.00 r f_reg/D (FD2) 0.00 3.00 r data arrival time 3.00 clock clk (rise edge) 5.00 5.00 clock network delay (ideal) 0.00 5.00 f_reg/CP (FD2) 0.00 5.00 r library setup time -0.85 4.15 data required time 4.15 ----------------------------------------------------------data required time 4.15 data arrival time -3.00 ----------------------------------------------------------slack (MET) 1.15
EMA1997
IX - 19 of 29
Point Incr Path ----------------------------------------------------------clock clk (rise edge) 0.00 0.00 clock network delay (ideal) 0.00 0.00 d_reg/CP (FD2) 0.00 0.00 r d_reg/Q (FD2) 1.37 1.37 f d (out) 0.00 1.37 f data arrival time 1.37 clock clk (rise edge) 5.00 5.00 clock network delay (ideal) 0.00 5.00 output external delay -2.00 3.00 data required time 3.00 ----------------------------------------------------------data required time 3.00 data arrival time -1.37 ----------------------------------------------------------slack (MET) 1.63
EMA1997
IX - 20 of 29
1.63 slack
d(out)
EMA1997
IX - 21 of 29
Set_dont_touch
Useful in hierarchical designs Assigned to a design or library cell Allows keeping a subdesign unchanged during re-optimization Applied to an instance u1
current design = TOP set_dont_touch u1 or set_dont_touch find(cell,u1)
Applied to a design
current_design=BlockA set_dont_touch find(design, BlockA)
EMA1997
IX - 22 of 29
Flattening
Put combin. logic as Achievable for less than 20 inputs May be expensive Y1 = ( a + b ) X1 = ac + bc X1 = Y1C To specify
set_flatten true set_structuring -timing true
To verify options
report_compile_options
EMA1997
IX - 23 of 29
Structuring
Improves area and gate count Timing driven (by default) or boolean structuring Boolean struct. 2X to 4X compilation time Y1 = ( b + d ) X1 = aY1 X2 = cY1
X1 = ( ab + ad ) X2 = ( bc + cd )
EMA1997
IX - 24 of 29
EMA1997
IX - 25 of 29
Characterization
Used in hierarchical designs Constraints on sub-designs depend on environment characterize capture surrounding constraints
read -f db TOP.db characterize u1 current_design = sub1 write script > sub1.scr compile current_design TOP characterize u2 current_design = sub2 write script > sub2.scr compile
EMA1997
IX - 26 of 29
Guidelines
Specify accurate timing
Accurate point to point delays for asynch paths Create_clock, group_path for synch. paths
Register output
Simplifies time budgeting
EMA1997
IX - 27 of 29
Guidelines (contd)
Group FSMs, optimize separately Size: 250-5000 Middle-of-the road strategy
Balance Hierach. vs. large flat design
Critical path should not traverse hierarch. boundaries Consider alternatives : instantiate logic vs. infer through DesignWare Put in same level of hierarchy
driving and driven of large fanouts Sharable resources: e.g. adders
EMA1997
IX - 28 of 29
Guidelines (contd)
Compile time too long ?
High map effort Design too large Declared false paths traversing hierarchies Glue logic at top level Inappropriate flattening Adders, muxes, XORs Over 20 inputs Boolean optimization ON. Not enough memory
EMA1997
IX - 29 of 29
Extracting FSMs
package states is type state is (s0, s1, s2, s3); end states; use work.states.all; entity ET is port (x, clk: in bit; z: out bit); end ET; process begin wait until clkevent and clk =1; if x=0 then z <= 0; else case st is when s0 => st <= s1; z <=0; when s1 => st <= s2; z <=0; when s2 => st <= s3; z <=0; when s3 => st <= s0; z <=1; when others => st <= s0; end case; end if; end process; end One; -- registered outputs
EMA1997
X - 2 of 8
> report_fsm
The design is not currently represented as a state machine
EMA1997
X - 3 of 8
Schematic
> set_fsm_state_vector { st_reg[0] st_reg[1] } > set_fsm_encoding { s0=2#00 s1=2#10 s2=2#01 s3=2#11} > group -fsm -design_name eg1_fsm > current_design =eg1_fsm > extract X FSM clk FF Z
EMA1997
X - 4 of 8
> report_fsm
Clock : clk Sense: rising_edge Asynchronous Reset: Unspecified Encoding Bit Length: 2 Encoding style : Unspecified State Vector: { st_reg[0] st_reg[1] }
EMA1997
X - 5 of 8
EMA1997
X - 6 of 8
> report_fsm
Recognizes the FSM
Clock : Unspecified Asynchronous Reset: Unspecified Encoding Bit Length: 2 Encoding style : Unspecified State Vector: { st_reg[1] st_reg[0] } State Encodings and Order: S0 : 00 S1 : 01 S2 : 10 S3 : 11
EMA1997
X - 7 of 8
> set_fsm_encoding_style one_hot > set_fsm_encoding { S0=2#1000 S1=2#0100 S2=2#0010 S3=2#0001 } > set_fsm_minimize true > compile -map_effort low
EMA1997
X - 8 of 8