VLSI Physical Design
21ET64D2
            By,
            Dr. Premananda B.S.
                  Module Objectives
• The course enables the student to understand:
   –   Complete ASIC (and FPGA) design flow.
   –   Different Adder, multiplier architectures.
   –   Logical effort, multi- and single-stage cells.
   –   Architectures of FPGA.
   –   Physical design aspects of ASIC design.
   –   Various algorithms are used in physical design.
                          Books
• Michael John Sebastian Smith, “Application - Specific
  Integrated Circuits” Addison-Wesley Professional; 1st edition,
  1997/Pearson Education, 2002.
• N. Weste and D. Harris, “CMOS VLSI Design: A Circuits and
  Systems Perspective”, 3rd Edition, Pearson Education, 2006.
• Vikram Arkalgud Chandrasetty, “VLSI Design: A Practical
  Guide for FPGA and ASIC Implementations”, Springer, 2011.
• Jan M. Rabaey. Anantha Chandrakasan, and Borivoje Nikolic,
  “Digital Integrated Circuits: A Design Perspective” 2nd
  Edition, Pearson Education India
• …
                   Modules
1. Types of ASIC, Design Flow, and Datapath Logic
   Cells
2. Datapath Logic Cells and ASIC Library Design
3. Programmable ASIC Architectures
4. ASIC Construction-I
5. ASIC Construction-II
                       Agenda
• Introduction, Types of ASICs
• ASIC design methodology
• ASIC design flow
   – FPGA Vs ASIC
• ASIC Cell Libraries
• Data Path Elements, Adders:
   – RCA, Carry save, Carry bypass, Brent-kung adder
                Integrated Circuits
• Integrated Circuits are classified as:
   – General purpose integrated circuits (GPICs)
   – Application-specific integrated circuits (ASICs)
                Introduction Contd...
• ASIC (“a-sick”), is a silicon chip hardwired to meet specific
  application needs of one electronic or range of products.
• An ASIC is an IC chip customized for a particular use, rather
  than intended for general-purpose use.
• Many electronic systems require an integrated dedicated
  components that are specialized to perform a specific task or a
  limited set of tasks, these are called ASICs.
• The respective ASIC will have its own architecture; and needs
  to support its own protocol requirements.
• Today’s ASIC has a complete system in a single, SoC.
• The various cost functions for an ASIC chip could be “Area,
  Timing, and Power” targets.
               The Performance Cube
       Delay
                              “Smaller is Better”
                                  Power
Cost
                 Examples of ASICs
• Examples of ICs that are not ASICs include:
   – Standard parts such as memory chips (ROMs, RAMs)
   – TTL-equivalent ICs at different levels
• Examples of ICs that are ASICs include:
   – A chip used as an IP core for microcontrollers, satellites, and
     other various applications related to the medical and research
     sectors
   – A chip designed to run in a digital voice recorder or a high-
     efficiency video codec
   – Automotive and Avionic Components
   – Satellite, Radar, and related Communication processors
   – Microprocessors, Memories, and Microcontrollers
           The Goal of ASIC Designer
▪ Meet the market requirement:
  •   Satisfying the customer’s need
  •   Increasing the functionality
  •   Reducing the cost
  •   Beating the competition
▪ Achieved by:
  • Using the next-generation silicon technologies
  • New design concepts and tools
  • High-level integration
             Types of ASICs
Different types of ASICs are:
• Full-custom ASICs
• Semi-custom ASICs
• Programmable ASICs
          Types of ASICs Contd...
Different types of ASICs are:
• Full-custom ASICs
• Semi-custom ASICs
  – Standard-cell–based ASICs (CBIC)
  – Gate-array–based ASICs
     • Channeled gate arrays
     • Channelless gate arrays
     • Structured gate arrays
• Programmable ASICs
  – Programmable Logic Devices (PLD)
  – Field Programmable Gate Array (FPGA)
VLSI Design Styles
                        Agenda
• Introduction, Types of ASICs
• ASIC design methodology
• ASIC and FPGA design flow
   – FPGA Vs ASIC
• ASIC Cell Libraries
• Data Path Elements, Adders:
   – RCA, Carry bypass, Carry save, Carry select, Conditional
     sum adder
                Full-Custom ASICs
• All mask layers are customized in a full-custom ASIC.
• An engineer designs some or all of the logic cells, circuits, or
  layouts specifically for one ASIC.
• Design a full-custom IC if there are no libraries available.
• In a full-custom ASIC an engineer designs some or all of the
  logic cells, circuits, or layout specifically for one ASIC.
• Microprocessors were full-custom, but designers are turning to
  semi-custom ASIC techniques in this area too.
• CMOS technology is preferred for analog functions:
   – CMOS is the most widely available IC technology.
   – Increased levels of integration required to mix analog and digital
     functions on the same IC.
            Full-Custom ASICs Contd...
• Full-custom offers:
   – highest performance
   – can be made low-power
   – lowest part cost (smallest die size)
• Disadvantages of:
   –   increased design time (time to market)
   –   increased complexity
   –   design expense (expertise required)
   –   higher risk
   –   most expensive to manufacture and to design
          Full-Custom ASICs Contd...
• Full-custom ASIC design makes sense only:
   – When no suitable libraries exist or
   – Existing library cells are not fast enough or
   – The available pre-designed/pre-tested cells consume too much
     power that a design can allow or
   – The available logic cells are not compact enough to fit or
   – Technology is new or/and so special that no cell library exits
              Semi-Custom ASICs
• In a semi-custom initially the standard logic cells are pre-
  designed and some of the mask layers are customized.
• Using pre-designed cells from a cell library makes it easy for
  the designers.
• Types of semi-custom ASICs are:
   – Standard-cell–based ASICs
   – Gate-array–based ASICs
      Standard–Cell-Based ASICs Contd...
• A cell-based ASIC (CBIC) uses pre-designed logic cells
  known as standard cells.
• Possibly megacells, megafunctions, full-custom blocks,
  system-level macros, fixed blocks, cores…
• Each standard cell in the library is constructed using full-
  custom design methods, but we can use these predesigned,
  pre-characterized, and pretested circuits without having to do
  any full-custom design.
• This design style gives the same performance and flexibility
  advantages of a full-custom ASIC but reduces design time
  and reduces risk.
      Standard–Cell-Based ASICs Contd...
• Advantages of CBICs is that designers:
   – save time and money
   – reduce risk by using the pre-designed, pre-characterized, and
     pre-tested, standard-cell library
   – standard cell optimized individually for speed or area
• Disadvantages are:
   – time or expense of designing or buying the standard cell library
   – time needed to fabricate all layers of ASIC for each new design
        Standard-Cell–Based ASICs Contd..
• Standard cells are designed to fit together like bricks in a wall as shown in
  figure next.
• Power and ground buses (VDD and GND or VSS) run horizontally on metal
  lines inside the cells.
                    Figure: Standard Cell Layout
      Standard-Cell–Based ASICs Contd..
• Standard cells are stacked like bricks in a wall; the abutment
  box (AB) defines the edges of the brick.
• The difference between the bounding box (BB) and the AB is
  the area of overlap between the bricks.
• Power supplies (VDD and GND) run horizontally inside a
  standard cell on a metal layer that lies above transistor layers.
• Each different shaded and labeled pattern represents a different
  layer.
• Standard cell has center connectors (3 squares, labeled A1, B1,
  and Z) that allow the cell to connect to others.
• Standard-cell design allows the automation of the process of
  assembling an ASIC.
      Standard-Cell–Based ASICs Contd..
• Groups of standard cells fitted horizontally together to form
  rows.
• Rows stack vertically to form flexible rectangular blocks.
   – Connect a flexible block built from several rows of standard cells to
     other standard-cell blocks or other full-custom logic blocks.
• Both cell-based and gate-array ASICs use predefined cells, but
  there is a difference:
   – Change the transistor sizes in a standard cell to optimize speed and
     performance, but the device sizes in a gate array are fixed.
   – Results in a trade-off in performance and area in a gate array at the
     silicon level.
• The trade-offs between area and performance are made at the
  library level for a standard-cell ASIC.
• A CBIC structure is shown
• The important features of CBIC are:
   – All mask layers are customized - transistors and interconnect
   – Custom blocks can be embedded
   – Manufacturing lead time is about eight weeks
             Gate-Array–Based ASICs
• In gate array (GA) based ASICs the transistors are predefined
  on the silicon wafer.
• A predefined pattern of transistors on a GA is a base array.
• The smallest element that is replicated to make the base array
  is the base (primitive) cell.
• The top few layers of metal that define interconnect between
  transistors are defined by the designer using custom masks,
  which reduces the time needed to make a masked gate array.
• The designer chooses from a gate-array library of predesigned
  and pre-characterized logic cells.
        Gate-Array–Based ASICs Contd..
• The logic cells in a gate array library are often called macros.
• Gate-arrays are sometimes referred to as prediffused arrays.
• Using the wafer prefabricated up to metallization steps reduces
  the time needed to make an MGA, and the turnaround time (a
  few days to a couple of weeks).
• There are three types of MGAs:
   – Channeled gate arrays
   – Channelless gate arrays
   – Structured gate arrays
                 Channeled Gate Array
• In a Channeled gate array we leave space between the rows
  of transistors for wiring.
• The space for interconnecting between rows of cells is fixed
  in height.
• The important features of channeled gate array are:
   – Only the interconnect is customized
   – The interconnect uses predefined spaces         between
     rows of base cells
   – Manufacturing lead time is between two             days
     and two weeks
               Channelless Gate Array
• A channelless GAs (sea-of-gates array or SOG array).
   – Only some (the top few) mask layers are customized-the interconnect
   – Manufacturing lead time is between two days and two weeks.
• Customizing the contact layer in channelless GAs allows us to
  increase the density of gate-array cells.
• No predefined areas are set aside for routing between cells.
• The logic that can be implemented
  in a given silicon area is higher.
• When using an area of transistors for
  routing, do not make any contact with
  the devices lying underneath.
              Structured Gate Array
• A Structured Gate Array can be either channeled or
  channelless but it includes or embeds a custom block.
• An embedded gate array or structured gate array combines
  some of the features of CBICs and MGAs.
• An embedded GA is as shown.
          Structured Gate Array Contd...
• The important features of structured GAs are:
   – Only the interconnect is customized
   – Custom blocks (the same for each design) can be embedded
   – Manufacturing lead time is between two days and two
     weeks
• An embedded gate array gives the improved area efficiency
  and increased performance of a CBIC but with the lower cost
  and faster turnaround of an MGA.
• The disadvantage is that the embedded function is fixed, this
  makes the implementation of memory difficult and inefficient.
              Programmable ASICs
• All the logic cells are predesigned and none of the mask layers
  are customized.
• Two types of programmable ASICs:
   – Programmable logic devices (CPLD)
   – Field-programmable gate array (FPGA)
           Programmable Logic Devices
• Examples and types of PLDs are:
   –   Read-only memory (ROM)
   –   Programmable ROM or PROM
   –   Electrically programmable ROM, or EPROM
   –   Erasable PLD (EPLD)
   –   Electrically erasable PROM, or EEPROM
   –   UV-erasable PROM, or UVPROM
   –   Logic arrays may be either a Programmable Array Logic (PAL)
       or a programmable logic array (PLA), both have an AND plane
       and an OR plane.
• PLD is as shown in Figure,
• Important features that all PLDs have in common:
   –   No customized mask layers or logic cells
   –   Fast design turnaround
   –   A single large block of programmable interconnect
   –   A matrix of logic macrocells that usually consist of programmable
       array logic followed by a flip-flop or latch
       Field-Programmable Gate Arrays
• Essential characteristics of an FPGA are:
   – None of the mask layers are customized
   – A method for programming the basic logic cells and the
     interconnect
   – The core is a regular array of programmable basic logic cells
     that can implement combinational as well as sequential logic
     (flip-flops)
   – A matrix of programmable interconnect surrounds the basic
     logic cells
   – Programmable I/O cells surround the core
   – Design turnaround is a few hours
Field-Programmable Gate Arrays Die
      Field-Programmable Gate Arrays
• All FPGAs contain a basic logic cell replicated in a regular
  array across the chip.
• Three different types of basic logic cells:
   – multiplexer based (Actel)
   – look-up table based (Xilinx)
   – programmable array logic (Altera)
• Choice among these depends on the programming technology.
• The programming technologies for FPGAs:
   – Anti fuse
   – Static RAM cells
   – EEPROM transistor
                       Agenda
• Introduction, Types of ASICs
• ASIC design methodology
• ASIC design flow
   –FPGA Vs ASIC
• ASIC Cell Libraries
• Data Path Elements, Adders:
   – RCA, Carry save, Carry bypass, Brent-kung adder
                    ASIC Design Flow
• A design flow is a sequence of steps to design an ASIC
   – Chip specification: An engineer defines features, microarchitectures,
     functionalities, and specifications.
   – Design entry: Using an HDL or schematic entry.
   – Logic synthesis: Produces netlist-logic cells and their connections.
   – System partitioning: Divide a large system into ASIC-sized pieces.
   – Prelayout simulation: Check to see if the design functions correctly.
   – Floorplanning: Arrange the blocks of the netlist on the chip.
   – Placement: Decide the locations of cells in a block.
   – Routing: Make the connections between cells and blocks.
   – Extraction: Determine resistance and capacitance of the interconnect.
   – Postlayout simulation: Check to see if the design still works with the
     added loads of the interconnect.
   – Final verification
   – GDS II
               ASIC Design Flow Process
•   Chip Specification
•   Design Entry
•   Functional Verification
•   RTL Synthesis
•   Partitioning of chip
•   Design for Test insertion
•   Floor planning
•   Placement Stage
•   Clock tree synthesis
•   Routing Stage
•   Final verification:
     – Layout versus schematic, Design rule checks, and Logical equivalence checks
• GDS II
                    What is Synthesis?
• A process of automatically transforming a hardware description from a
  higher level of abstraction to a lower level of abstraction, or from
  behavioral to structural domain.
• Translate HDL descriptions into logic gate networks.
   Synthesis = T O M
HDL Synthesis
ASIC Design Flow
FPGA Design Flow
ASIC !!!!!!!
    or
FPGA !!!!!!!
          ASIC Design Advantages
Cost....cost....cost....Lower unit costs:
• Larger volumes of ASIC design prove to be cheaper than
  implementing design using FPGA.
Speed...speed...speed....ASICs are faster than FPGA:
• ASIC gives design flexibility.
• This gives enormous opportunities for speed optimizations.
Low power....Low power....Low power:
• ASIC can be optimized for required low power.
• Several low-power techniques such as power gating, clock
  gating, multi-VT cell libraries, pipelining, etc are available.
       ASIC Design Disadvantages
Time-to-market:
• Some large ASICs can take a year or more to design.
• A good way to shorten development time is to make
  prototypes using FPGAs and then switch to an ASIC.
Design Issues:
• In ASIC you should take care of DSM issues, signal integrity
  issues, and many more.
• In FPGA you don’t have all of these because ASIC designer
  takes care of all these.
Expensive Tools:
• ASIC design tools are very expensive.
           FPGA Design Advantages
•   Faster time-to-market
•   No Non Recurring Expenses (NRE )
•   Simpler design cycle
•   Field reprogramability
•   More predictable project cycle
•   Reusability
       FPGA Design Disadvantages
• Power consumption in FPGA is more.
   – Don't have any control over the power optimization. This is
     where ASIC wins the race!
• Have to use the resources available in the FPGA.
   – Thus FPGA limits the design size.
• Good for low-quantity production.
   – As quantity increases cost per product increases compared
     to the ASIC implementation.
Design Cycle
Exploding ASIC NRE Cost
Unit Cost Analysis
Time to Market
           System Reconfigurability
                             Time
Lack of reconfigurability is a huge opportunity cost of ASICs
   FPGA offer flexible life cycle management that offers
Break-even Graph
Comparisons
                       Agenda
• Introduction, Types of ASICs
• ASIC design methodology
• ASIC design flow
   – FPGA Vs ASIC
• ASIC Cell Libraries
• Data Path Elements, Adders:
   – RCA, Carry save, Carry bypass, Brent-kung adder
                 ASIC Cell Libraries
For the MGAs and CBICs, we have three choices:
➢ Use a design kit from the ASIC vendor
   – is usually a phantom library-the cells are empty boxes or phantoms,
     you hand off your design to the ASIC vendor and they perform
     phantom instantiation
➢ Buy an ASIC-vendor library from a library vendor
   – involves a buy-or-build decision.
   – you need a qualified cell library (qualified by the ASIC foundry
   – If you own the masks you have a customer-owned tooling solution
➢ Build your own cell library in-house
   – involves a complex library development process
   – Complex and very expensive
• Each cell in an ASIC cell library must contain:
   –   physical layout
   –   behavioral model
   –   Verilog HDL/VHDL model
   –   timing model
   –   test strategy
   –   characterization
   –   circuit extraction
   –   process control monitors (PCMs)
   –   cell schematic & icon
   –   layout versus schematic (LVS) check
   –   logic synthesis
   –   retargeting
   –   wire-load model
   –   routing model
                    Agenda
• Introduction, Types of ASICs
• ASIC design methodology
• ASIC design flow
   – FPGA Vs ASIC
• ASIC Cell Libraries
• Data Path Elements, Adders:
   – RCA, Carry save, Carry bypass,
     Brent-kung adder
              Data Logic Cells
• Datapath Logic Cells
• Adders:
  –   Ripple carry adder
  –   Carry save adder
  –   Carry bypass adder
  –   Brent-kung adder
  –   Carry select adder
  –   Conditional sum adder
          Datapath Logic Cells
Full Adder (FA):
 parity function ('1' for an odd numbers of '1's)
 majority function ('1' if the majority of the inputs are '1')
 S[i] = SUM (A[i], B[i], CIN)
 COUT = MAJ (A[i], B[i], CIN)
         Datapath Logic Cells Contd…
• Cells are placed together in rows on a CBIC or an MGA, but
  there is no regularity to the arrangement of the cells within the
  rows.
• Software arranges the cells and completes the interconnect.
• Datapath layout automatically takes care of most of the
  interconnect between the cells.
• There are newer standard-cell and gate-array tools that can take
  advantage of regularity in a design and position cells carefully.
• The problem is in finding the regularity if it is not specified.
• Using a datapath is one way to specify regularity to ASIC
  design tools.
         Datapath Logic Cells Contd…
• Advantages of using a datapath:
   – Regular layout produces predictable and equal delay for each bit.
   – Interconnect between cells can be built into each cell.
• Disadvantages of using a datapath:
   – The overhead (buffering and routing the control signals, for example)
     can make a narrow (small number of bits) datapath larger and slower
     than a standard-cell (or even gate-array) implementation.
   – Datapath cells have to be pre-designed for use in a wide range of
     datapath sizes.
   – Datapath cell design can be harder than designing gate-array macros or
     standard cells.
   – Software to assemble a datapath is more complex and not as widely
     used as software for assembling standard cells or gate arrays.
                Datapath Elements
Figure: Symbols for a datapath adder.
(a) A data bus is shown by a heavy line and a bus symbol. If the bus is
   n-bits wide then MSB = n-1.
(b) An alternative symbol for an adder.
(c) Control signals are shown as lightweight lines.
Adders
Full Adder using Two Half Adders
         N-Bit RCA: Series of FA Cells
• To add two n-bit numbers
         An-1 Bn-1           A2    B2   A1    B1    A0    B0
  C        FA        ...      FA         FA          FA        C0
   n
           Sn-1               S2         S1          S0
Note: adder delay = Tc * n                           A     B
Tc = (Cin:Cout delay)
                                              Cou        FA     Ci
                                               t
                                                                    n
                                                         Sum
           N-Bit RCA: Series of FA Cells
• It is possible to create a logical circuit using multiple full
  adders to add N-bit numbers.
• Each full adder inputs a Cin, which is the Cout of the previous
  adder, this kind of adder is a ripple carry adder since each
  carry bit "ripples" to the next full adder.
• In Ripple Carry Adder:
   –   Multiple full adders with carry-ins and carry outs
   –   Chained together
   –   Small Layout area
   –   Large delay time
        • If the delay for each full adder is 8ns then the total delay for a four-bit
          parallel adder is 32ns.
      4-bit Ripple Carry Addition: Example
              0         0         0        1         1        0         1        1
A=0011        A3        B3        A2       B2        A1       B1        A0       B0
B=0101
         C4                  C3                 C2                 C1
                  FA                  FA                 FA                 FA        C0   0
                   S3                 S2                 S1                 S0
T=0      0         0         0        0         0         0        0         0        S=0000
T=1      0         0         0        1         0         1        1         0        S=0110
T=2      0         0         0        1         1         0        1         0        S=0100
T=3      0         0         1        0         1         0        1         0        S=0000
T=4      0         1         1        0         1         0        1         0        S=1000
                             PGK
  • What happens to the
    propagating carry in
                                         A B Cin Cout
    bit position k?
0-propagate                              0   0   -   0 (kill)
                           kill
                                         0   1   C   C (propagate)
    A     B         A                    1   0   C   C (propagate)
                    B                    1   1   -   1 (generate)
    C                             Cout
                    B                    p = A+B (or A  B)
                    A                    g = A.B
    A     B
1-propagate          generate
                           PGK
• For a full adder, define what happens to carries
   – Generate: Cout = 1 independent of C
      • G=A•B
   – Propagate: Cout = C
      • P=AB
   – Kill: Cout = 0 independent of C
      • K = ~A • ~B
                 Carry-Save Adder
• CSA cell CSA(A1[i], A2[i], A3[i ], CIN, S1[i], S2[i], COUT)
  has three outputs:
• In CSA the carries are saved at each stage and shifted left onto
  the bus S1.
• There is thus no carry propagation and the delay of a CSA is
  constant.
• At the output of a CSA we need to add the S1 bus (all saved
  carries) and the S2 bus (all the sums) to get an n-bit result
  using a final stage.
                 Carry-Save Adder
• The n-bit sum is encoded in the two buses S1 and S2 in the
  form of parity and majority functions.
• The last stage sums up two input buses using a carry-propagate
  adder.
• By registering the CSA stages by adding vectors of flip-flops,
  reduces the adder delay to that of the slowest adder stage.
• By using the registers between the stages of combinational
  logic we use pipelining to increase the speed and pay the price
  of increased area and introduce latency.
• It takes a few clock cycles to fill the pipeline, but once it is
  filled, we get output for every clock cycle.
  CSA Application: Multi-input Adders
• Use k-2 stages of CSAs
   – Keep result in carry-save redundant form
• Final CPA computes actual result
                                   0001   X
              0001 0111 1101 0010 0111    Y
                                  +1101   Z
                  4-bit CSA        1011   S
                                  0101_   C
             0101_ 1011
                                  0101_   X
                      5-bit CSA    1011   Y
                                  +0010   Z
                                          S
                                          C
                        +
                                          A
                                          B
                                          S
CSA Application: Multi-input Adders
                               0001   X
       0001 0111 1101 0010 0111       Y
                              +1101   Z
           4-bit CSA           1011   S
                              0101_   C
      0101_ 1011
                              0101_   X
               5-bit CSA       1011   Y
                              +0010   Z
        01010_         00011 00011    S
                             01010_   C
                    +
                             01010_   A
                            + 00011   B
                                      S
CSA Application: Multi-input Adders
                               0001   X
       0001 0111 1101 0010 0111       Y
                              +1101   Z
           4-bit CSA           1011   S
                              0101_   C
      0101_ 1011
                              0101_   X
               5-bit CSA       1011   Y
                              +0010   Z
        01010_         00011 00011    S
                             01010_   C
                    +
                             01010_   A
                 10111      + 00011   B
                              10111   S
                Carry-Bypass Adder
• In RCA every stage has to wait to make its carry decision until
  the previous stage has been calculated.
• Bypass the critical path.
• Uses the idea that if corresponding bits in the two words to be
  added are not equal, a carry signal into that bit position will be
  passed to the next bit position.
• Improves delay of RCA.
• Bypass the carries for bits 4-7 (stages 5-8) of an adder we can
  compute BYPASS = P[4]·P[5]·P[6]·P[7] and then use a
  multiplexer (MUX) as follows:
              Carry-Skip Adder
• Instead of checking the propagate signals, we can
  check the inputs.
• We can compute
      SKIP = (A[i – 1] B[i – 1]) + A[i] B[i]) and
      then use a 2:1 MUX to select C[i].
  Thus,
                           Carry-Skip Adder Contd..
       • Carry-ripple is slow through all N stages.
       • Carry-skip/bypass allows carry to skip or bypass over groups
         of n bits, decision-based on n-bit propagate signal.
              A16:13 B16:13             A12:9 B12:9            A8:5 B8:5            A4:1 B4:1
              P16:13                    P12:9                  P8:5                 P4:1
          1                   C12   1                 C8   1               C4   1
Cout                                                                                            Cin
          0        +                0       +              0          +         0          +
                  S16:13                   S12:9                  S8:5                 S4:1
Why Carry skip/bypass adder is faster?
       Carry-Lookahead Adder: Idea
• New look: carry propagation
• Idea:
   – Try to “predict” Ck earlier than Tc*k
   – Instead of passing through k stages, compute Ck separately
     using 1-stage CMOS logic
• Carry propagation: an example
       Bit position   7   6   5   4      3   2   1    0
            Carry     1   0   0   1     1    1    1
                A     0   1   0   0     1    1    0   1    +
                B     0   1   0   0     0    1    1   1
            Sum       1   0   0   1     0    1    0    0
        Carry Propagation: An Example
• In the example:
   – By looking at (A4, B4), we can say for sure that C5=0, no
     matter what C4 is.
   – By looking at (A2, B2), we can say for sure that C3=1.
   – C3, “generated” by (A2, B2) (or by A0, B0 if you wish),
     “propagated” as far as C4, and then got “killed” by (A4, B4)
   – The idea is, can we look at A(3..0) and B(3..0) and
     determine the value of C4 in one shot?
            Carry-Lookahead Adder
             Brent-Kung Adder
• Consider recursively C[i] = G[i] + P[i] ·G[i-1]
• For i = 1 in C[1] = G[1] + P[1] ·C[0]
• C[2] = G[2] + P[2]·G[1] + P[2]·P[1]·G[0]
• C[3] = G[3] + P[3]·G[2] + P[3]·P[2]·G[1] + P[3]·P[2]·P[1]·G[0]
• or C[i +1] = G[i +1] + P[i +1] ·G[i]
• For first stage G[0] = C[0]
• The Brent-Kung adder reduces the delay and increases the
  regularity of the carry-lookahead scheme.
• The 4-bit CLA using a carry-lookahead generator cell (CLG)
  is shown next.
Brent-Kung Adder
                   Brent-Kung Adder
(a) Carry generation in a 4-bit CLA.
(b) A cell to generate the lookahead terms, C[0]–C[3].
(c) Cells L1, L2, and L3 are rearranged into a tree that has less delay.
    Cell L4 is added to calculate C[2] which is lost in the translation.
(d) and (e) Simplified representations of parts (a) and (c).
(f) The lookahead logic for an 8-bit adder. The inputs, 0–7, are the
    propagate and carry terms formed from the inputs to the adder.
(g) An 8-bit Brent–Kung CLA.
• The outputs of the lookahead logic are the carry bits that
  (together with the inputs) form the sum.
• The advantage of this adder is that delays from the inputs to
  the outputs are nearly equal to those in other adders.
• This reduces the number of unwanted and unnecessary
  switching events and thus reduces power dissipation.
Thank You
Beyond the Syllabus
Xilinx FPGA design flow
                       FPGA Vs ASIC
                  Design Flow Comparison
• The FPGA design flow eliminates the
   –   complex and time-consuming floorplanning,
   –   place and route,
   –   timing analysis, and
   –   mask / re-spin stages of the project since the design logic is
       already synthesized to be placed onto an already verified,
       characterized FPGA device.
• Xilinx provides the advanced floorplanning, hierarchical
  design, and timing tools to allow users to maximize
  performance for the most demanding designs.
• Scan insertion and clock tree synthesis are not required in
  FPGA design flows because of the inherent design of Altera
  FPGAs.
FPGA and ASIC Design Flows
   Fundamentally Similar
                             Product Cost
• In a product cost there are fixed costs and variable costs
  (number of products sold is the sales volume):
      total product cost = fixed product cost + variable product cost x
                                             products sold
• In a product made from parts the total cost for any part is
      total part cost = fixed part cost + variable cost per part x volume of
                                                                  parts
• For example, suppose we have the following costs:
   – FPGA: $21,800 (fixed) $39 (variable)
   – MGA: $86,000 (fixed) $10 (variable)
   – CBIC $146,000 (fixed) $8 (variable)
• Then we can calculate the following break-even volumes:
   – FPGA/MGA » 2000 parts
   – FPGA/CBIC » 4000 parts
   – MGA/CBIC » 20,000 parts
                   ASIC Fixed Costs
Examples of fixed costs are:
   –   training cost for a new (EDA) system
   –   hardware and software cost
   –   productivity
   –   production test and design for test
   –   programming costs for an FPGA
   –   nonrecurring-engineering (NRE)
   –   test vectors and test-program development cost
   –   pass (turn or spin)
   –   profit model represents the profit
   –   flow during the product lifetime
               ASIC Variable Costs
Factors affecting fixed costs are:
  –   wafer size and cost
  –   gate density and gate utilization
  –   die size and die cost
  –   die per wafer
  –   defect density
  –   yield
  –   profit margin (depends on fab or fabless)
  –   price per gate
  –   part cost
   16 Bit constant block width carry skip
                   adder
• Single level
   16 Bit Variable block width carry skip
                   adder
• Multiple level