Programmable Logic Devices
(PLD)
Shridhar S. Dudam
Agenda
Introduction to PLD
SPLD
CPLD
FPGA
Xilinx FPGA Architecture
Programmable Logic Devices
2
Overview of IC’s
Logic
Standard ASIC
Logic
Gate Arrays Cell-Based Full Custom
PLD ICs ICs
SPLDs
(PAL/PLA/GAL) CPLDs FPGAs
Programmable Logic Devices
3
What is Programmable Logic?
These devices mainly contain :
1. Large amounts of logic gates.
2. Large amounts of FlipFlops.
3. Programmable interconnects.
In these devices Configuration Memory
defines the function that the Logic will
perform.
Why go for PLD’s ?
• Flexibility.
• In system programmability.
• Less project development time.
• Best prototyping solution.
• Cost effective solutions.
• Involves less risk.
• Design security.
• Consumes less board area.
• Reconfigurable computing.
• Best suits hardware verification of design.
PLD Fundamental
Programmable Logic Devices
6
PLD Symbology
Programmable Logic Devices
7
Programmable Logic Array (PLA)
Programmable Logic Devices
9
PLA Gate Level Diagram
Programmable Logic Devices
10
PAL Output Circuit
Programmable Logic Devices
12
Typical PAL Architecture
Programmable Logic Devices
13
Complex Programmable Logic
Devices (CPLD)
Non-volatile configuration memory
Such as EPROM, or FLASH
Complexity between that of PAL/PLAs
and FPGAs
Predictable timing
Fast pin-pin delays
In-system Programmable (ISP)
Programmable Logic Devices
14
CPLD Block Diagram
Programmable Logic Devices
15
CPLD Section
Programmable Logic Devices
16
Field Programmable Arrays
Dominant digital design implementation
Ability to re-configure FPGA and implement any digital
logic function
Partial re-configuration allows a portion of the FPGA to be
continuously running while another portion is being re-
configured
FPGAs also contain analog circuitry features including
a programmable slew rate and drive strength,
differential comparators on I/O designed to be
connected to differential signaling channels.
Mixed-signal FPGAs contains ADCs and DACs with
analog signal conditional blocks allowing them to
operate as a system-on-chip (SoC)
FPGA Block Diagram
Programmable Logic Devices
18
FPGA Operation
User writes configuration memory which
defines the function of the system. This
includes: the connectivity between the
CLBs and the I/O cells, the logic to be
implemented onto the CLBs, and the I/O
blocks.
By changing the data in the configuration
memory, the function of the system
changes as well. This change in data can
be implemented at anytime during FPGA
operation (run-time configuration).
FPGA Configuration Memory
SRAM:
SRAM cells are used to store data as well for interconnect
Reprogrammable
Volatile in nature
Require extra circuitry to program
Best for prototyping
Altera, Xilinx, Lattice
Flash:
Nonvolatile
Can be programmed several times
Expensive
Actel
Antifuse:
Small Size
Nonvolatile
OTP
Expensive
Quicklogic
Programmable Logic Devices
20
FPGA Architectures
Early FPGAs
N x N array of unit cells (CLB + routing)
Special routing along center axis
Next Generation FPGAs
M x N unit cells
Small block RAMs around edges
More recent FPGAs
Added block RAM arrays
Added multiplier cores
Adders processor cores
FPGA Architecture Trends
Memories
Single & Dual-port RAMS
FIFO (first-in first-out)
ECC (error correcting codes)
Digital Signal Processors
Multipliers
Accumulators
Arithmetic Logic Units (ALUs)
Embedded Processors
Hardcore (dedicated processors)
Dedicated program and data memories
Programmable RAM in FPGA can be used in conjunction with
the processor to provide program and data memories
Soft core (synthesized from a HDL)
Configurable Logic Blocks (CLBs)
Architecture
CLBs consist of:
Look-up Tables (LUT) which implement the entries of a logic
functions truth table
Some FPGAs can use LUTs to implement small Random Access
Memory (RAM)
Carry and Control Logic
Implements fast arithmetic operations (adders/ subtractors)
Can be alsoconfigured for additional operations (Built-in-Self Test
iterative-OR chain)
Memory Elements
Configurable Flip Flops (FFs)/ Latches( Programmable clock edges,
set/reset, and clock enable)
These memory elements usually can be configured as shift-
registers
Configurable Logic Blocks
A CLB can contain
several slices, which
make up a single CLB.
Xilinx Virtex-5 FPGAs
(right) have two slices:
SLICEL (logic) and
SLICEM (memory).
In addition to the basic
CLB architecture, the
Virtex-5 contains wide-
function MUXs which can
implement:
- 4:1 MUX using 1 LUT
- 8:1 MUX using 2 LUTs
- 16:1 MUX using 4 LUTs
FPGA Two Input Lookup Table
Programmable Logic Devices
25
FPGA Three Input Lookup Table
Inclusion of Flip-flop in FPGA
Logic Block
Programmable Logic Devices
26
Look-up Tables (2:1 MUX
Example)
Configuration memory holds output of
truth table entries
Internal signals connect to control
signals of MUXs to select a values of the
truth tables for any given input signals
LUT Based Ram
Normal LUT mode
performs read
operation
Address decoders with
WE generates clock
signals to latches for
write operation
Smaller RAMs can be
combined to create
larger RAMs (up to 64-
bit in Virtex-5)
Interconnection Matrix (1)
Programmable Logic Devices
29
Programmable
Interconnection Network
Horizontal and vertical mesh of wire segments interconnected by
programmable switches called programmable interconnect points (PIPs).
These PIPs are implemented using a transmission gate controlled by a
memory bits from the configuration memory.
Several types of PIPs are used
Cross-point = connects vertical or horizontal wire segments allowing turns
Breakpoint = connects or isolates 2 wire segments
Decoded MUX = group of 2^n cross-points connected to a single output configure by n
configuration bits
Non-decoded MUX = n wire segments each with a configuration bit (n segments)
Compound cross-point = 6 Break-point PIPS (can isolate two isolated signal nets)
FPGA Implementation
Programmable Logic Devices
31
Progammable Input/Output Cells
Bi-directional Buffers
Programmable for inputs or outputs
Tri-state controls bi-directional operation
Pull-up/down resistors
FFs/ Latches are used to improve timing issues
Set-up and hold times
Clock-to-out delay
Routing Resources
Connections to core of array
Programmable I/O voltage and current
levels
FPGA Configuration Interfaces
Master (Serial or Parallel)
FPGA retrieves configuration from ROM at initial power-up
Slave (Serial or Parallel)
FPGA configured by an external source (i.e microprocessor/
other FPGA)
Used for dynamic partial re-configuration
Boundary Scan
4-wire IEEE standard serial interface used for testing
Write and read access to configuration memory
Interfaces to FPGA core internal routing network
Boundary Scan Configuration
Multi-FPGA Emulation Framework to
Developed to test support NoC design and verification
(UNLV NSIL)
interconnect between
chips on PCB
Test Access Point
(TAP) controller Daisy Chain Configuration
composed of 16
state FSM
CPLD and FPGA Difference
Complex Programmable Logic Field-Programmable Gate Array
Device (CPLD) (FPGA)
Architecture PAL/22V10-like Gate array-like
More Combinational More Registers + RAM
Density Low-to-medium Medium-to-high
Performance Predictable timing Application dependent
Interconnect “Crossbar Switch” Incremental
Technology EPROM/Flash SRAM/Antifuse/Flash
Programmable Logic Devices
35
Programmable Logic Solutions
• No high development cost barriers
• Recovered time for authoring and innovating
– SW improvements reduce design iterations
• No lengthy prototyping cycle
• Ability to remotely upgrade any networked
system
• Ultimate flexibility to manage rapid change
Programmable Logic Devices
36
Custom Chips: Standard Cells
A section of two arrays in a Standard Cell Chip
Programmable Logic Devices
37
PLD Vendors
Xilinx
Altera
Lattice
Quicklogic
Xilinx FPGA Architecture
Programmable Logic Devices
39
Outline
Overview
Slice Resources
I/O Resources
Memory and
Clocking
Spartan-3,
Spartan-3E, and
Virtex-II Pro
Features
Virtex-4 Features
Summary
Programmable Logic Devices
40
Overview
All Xilinx FPGAs contain the same basic resources
Slices (grouped into CLBs)
Contain combinatorial logic and register resources
IOBs
Interface between the FPGA and the outside world
Programmable interconnect
Other resources
Memory
Multipliers
Global clock buffers
Boundary scan logic
Programmable Logic Devices
41
The Spartan-3 Solution
A New Class of Spartan FPGAs
18x18 bit Embedded
Pipelined Multipliers
for efficient DSP Configurable 18K Block
RAMs + Distributed RAM
Spartan-3
Bank 0
Bank 3
Bank 1
Bank 2
4 I/O Banks,
Support for
Up to eight on-chip all I/O Standards
Digital Clock Managers including
to support multiple PCI, DDR333,
system clocks RSDS, mini-LVDS
Programmable Logic Devices
43
Virtex-II Pro Platform FPGA
• 3.125 Gbps Multi-Gigabit MGT MGT
Transceivers (MGTs)
Fabric
• Supports 10 Gbps
standards
Up to 24 per device • IP-Immersion™ Fabric
• Active Interconnect™
• 18Kb Dual-Port RAM
• Xtreme™ Multipliers
• 16 Global Clock Domains
MGT MGT
• PowerPC 405 Core
• 300+ MHz / 450+ DMIPS
Performance
• Up to 4 per device
Programmable Logic Devices
44
Outline
Overview
Slice Resources
I/O Resources
Memory and
Clocking
Spartan-3,
Spartan-3E, and
Virtex-II Pro
Features
Virtex-4 Features
Summary
Programmable Logic Devices
45
Slices and CLBs
Each Virtex-II CLB COUT COUT
BUFT
contains four slices BUF T
Slice S3
Local routing provides
feedback between slices
Slice S2
in the same CLB, and it Switch SHIFT
Matrix
provides routing to
neighboring CLBs Slice S1
A switch matrix provides
access Slice S0 Local Routing
to general routing
CIN CIN
resources
Programmable Logic Devices
46
Simplified Slice Structure
Each slice has four
outputs
Two registered outputs, Slice 0
two non-registered outputs
PRE
Two BUFTs associated LUT Carry D
CE
Q
with each CLB, accessible
CLR
by all 16 CLB outputs
Carry logic runs
vertically, D PRE
LUT Carry
up only CE Q
Two independent CLR
carry chains per CLB
Programmable Logic Devices
47
Detailed Slice Structure
The next few slides
discuss the slice
features
LUTs
MUXF5, MUXF6,
MUXF7, MUXF8
(only the F5 and
F6 MUX are shown
in this diagram)
Carry Logic
MULT_ANDs
Sequential
Elements
Programmable Logic Devices
48
Look-Up Tables
A B C D Z
Combinatorial logic is stored in Look-Up
0 0 0 0 0
Tables (LUTs)
0 0 0 1 0
Also called Function Generators (FGs)
Capacity is limited by the number of inputs, 0 0 1 0 0
not by the complexity 0 0 1 1 1
Delay through the LUT is constant 0 1 0 0 1
0 1 0 1 1
Combinatorial Logic . . .
A 1 1 0 0 0
B
Z 1 1 0 1 0
C
D 1 1 1 0 0
1 1 1 1 1
Programmable Logic Devices
49
Connecting Look-Up Tables
MUXF8 combines the two
CLB
F8
MUXF7 outputs (from the CLB
above or below)
F5
Slice S3
MUXF6 combines slices S2
F6
and S3
F5
Slice S2
MUXF7 combines the two
F7
MUXF6 outputs
Slice S1
F5
MUXF6 combines slices S0 and S1
F6
Slice S0
F5
MUXF5 combines LUTs in each slice
Programmable Logic Devices
50
Fast Carry Logic
Simple, fast, and COUT COUT
complete To S0 of the
next CLB
To CIN of S2 of the next
CLB
arithmetic Logic SLICE
Dedicated XOR First Carry
S3
gate for single- Chain
CIN
COUT
level sum
completion SLICE
Uses dedicated S2
routing
SLICE
resources S1
CIN
All synthesis COUT
Second
Carry
tools can infer Chain
carry logic SLICE
S0
CIN CIN CLB
Programmable Logic Devices
51
MULT_AND Gate
Highly efficient multiply and add implementation
Earlier FPGA architectures require two LUTs per bit to
perform the multiplication and addition
The MULT_AND gate enables an area reduction by
performing the
multiply and the add in one LUT per bit
LUT
A S CO
DI
CY_MUX
CI
CY_XOR
MULT_AND
AxB
LUT
B LUT
Programmable Logic Devices
52
Flexible Sequential Elements
Either flip-flops or latches FDRSE_1
Two in each slice; eight in D S Q
each CLB CE
Inputs come from LUTs or R
from an independent CLB FDCPE
input D PRE Q
Separate set and reset CE
controls CLR
Can be synchronous or
asynchronous LDCPE
All controls are shared within D PRE Q
a slice CE
Control signals can be inverted G
CLR
locally within a slice
Programmable Logic Devices
53
Shift Register LUT (SRL16CE)
Dynamically addressable LUT
D
serial shift registers CE
D Q
CE
Maximum delay of 16 clock CLK
cycles per LUT (128 per D Q
CLB) CE
Cascadable to other LUTs or
CLBs for longer shift D Q Q
registers CE
Dedicated connection from
Q15 to D input of the next
SRL16CE
Shift register length can
be changed LUT
D Q
CE
asynchronously
by toggling address A A[3:0]
Q15 (cascade out)
Programmable Logic Devices
54
Shift Register LUT Example
The SRL can be used to create a No Operation (NOP)
This example uses 64 LUTs (8 CLBs) to replace 576 flip-flops
(72 CLBs) and associated routing and delays
12 Cycles
Operation A Operation B
64
4 Cycles 8 Cycles
64
Operation C Operation D - NOP
3 Cycles 9 Cycles
Paths are Statically
Balanced
12 Cycles
Programmable Logic Devices
55
Outline
Overview
Slice Resources
I/O Resources
Memory and
Clocking
Spartan-3,
Spartan-3E, and
Virtex-II Pro
Features
Virtex-4 Features
Summary
Programmable Logic Devices
56
IOB Element
Input path IOB
Two DDR registers Input
Reg DDR MUX
Output path OCK1
Reg
ICK1
Two DDR registers
Reg
Two 3-state enable OCK2 3-state Reg
DDR registers ICK2
Separate clocks and
Reg DDR MUX
clock enables for I and OCK1
O PAD
Reg
Set and reset signals OCK2 Output
are shared
Programmable Logic Devices
57
SelectIO Standard
Allows direct connections to external signals of varied
voltages and thresholds
Optimizes the speed/noise tradeoff
Saves having to place interface components onto your board
Differential signaling standards
LVDS, BLVDS, ULVDS
LDT
LVPECL
Single-ended I/O standards
LVTTL, LVCMOS (3.3V, 2.5V, 1.8V, and 1.5V)
PCI-X at 133 MHz, PCI (3.3V at 33 MHz and 66 MHz)
GTL, GTLP
and more!
Programmable Logic Devices
58
Digitally Controlled Impedance
DCI provides
Output drivers that match the impedance of the traces
On-chip termination for receivers and transmitters
DCI advantages
Improves signal integrity by eliminating stub
reflections
Reduces board routing complexity and component
count by eliminating external resistors
Eliminates the effects of temperature, voltage, and
process variations by using an internal feedback
circuit
Programmable Logic Devices
59
Outline
Overview
Slice Resources
I/O Resources
Memory and
Clocking
Spartan-3, Spartan-
3E, and Virtex-II
Pro Features
Virtex-4 Features
Summary
Programmable Logic Devices
60
Other Virtex-II Features
Distributed RAM and block RAM
Distributed RAM uses the CLB resources (1 LUT =
16 RAM bits)
Block RAM is a dedicated resources on the device
(18-kb blocks)
Dedicated 18 x 18 multipliers next to block
RAMs
Clock management resources
Sixteen dedicated global clock multiplexers
Digital Clock Managers (DCMs)
Programmable Logic Devices
61
Distributed SelectRAM Resources
Uses a LUT in a slice as RAM16X1S
memory D
WE
Synchronous write LUT A0
WCLK
O
A1
Asynchronous read A2
A3
Accompanying flip-flops
can be used to create RAM32X1S RAM16X1D
synchronous read D D
WE WE
RAM and ROM are Slice A0
WCLK
O A0
WCLK
SPO
initialized during A1
A2
A1
A2
configuration LUT A3
A4
A3
DPRA0 DPO
Data can be written to RAM DPRA1
after configuration DPRA2
DPRA3
Emulated dual-port RAM LUT
One read/write port
One read-only port
Programmable Logic Devices
62
Block SelectRAM Resources
Up to 3.5 Mb of RAM in 18- 18-kb block SelectRAM memory
kb blocks DIA
DIPA
Synchronous read and write ADDRA
True dual-port memory WEA
ENA
Each port has synchronous SSRA DOA
read and write capability CLKA DOPA
Different clocks for each port
DIB
Supports initial values DIPB
ADDRB
Synchronous reset on WEB
output latches ENB
SSRB DOB
Supports parity bits CLKB DOPB
One parity bit per eight data
bits
Programmable Logic Devices
63
Dedicated Multiplier Blocks
18-bit twos complement signed operation
Optimized to implement Multiply and Accumulate
functions
Multipliers are physically located next to block
SelectRAM™ memory
Data_A
(18 bits)
4 x 4 signed
18 x 18 Output
8 x 8 signed
Multiplier (36 bits)
12 x 12 signed
Data_B 18 x 18 signed
(18 bits)
Programmable Logic Devices
64
Global Clock Routing Resources
Sixteen dedicated global clock multiplexers
Eight on the top-center of the die, eight on the
bottom-center
Driven by a clock input pad, a DCM, or local routing
Global clock multiplexers provide the following:
Traditional clock buffer (BUFG) function
Global clock enable capability (BUFGCE)
Glitch-free switching between clock signals
(BUFGMUX)
Up to eight clock nets can be used in each clock
region of the device
Each device contains four or more clock regions
Programmable Logic Devices
65
Digital Clock Manager (DCM)
Up to twelve DCMs per device
Located on the top and bottom edges of the die
Driven by clock input pads
DCMs provide the following:
Delay-Locked Loop (DLL)
Digital Frequency Synthesizer (DFS)
Digital Phase Shifter (DPS)
Up to four outputs of each DCM can drive onto
global clock buffers
All DCM outputs can drive general routing
Programmable Logic Devices
66
Outline
Overview
Slice Resources
I/O Resources
Memory and
Clocking
Spartan-3,
Spartan-3E, and
Virtex-II Pro
Features
Virtex-4 Features
Summary
Programmable Logic Devices
67
Spartan-3 versus Virtex-II
Lower cost More I/O pins per package
Smaller process = lower Only one-half of the slices
core voltage support RAM or SRL16s
(SLICEM)
.09 micron versus .15
micron Fewer block RAMs and
multiplier blocks
Vccint = 1.2V versus
1.5V Same size and
functionality
Different I/O standard
support Eight global clock
multiplexers
New standards: 1.2V
LVCMOS, 1.8V HSTL, Two or four DCM blocks
and SSTL No internal 3-state buffers
Default is LVCMOS, 3-state buffers are in the
versus LVTTL I/O
Programmable Logic Devices
68
SLICEM and SLICEL
Each Spartan™-3 CLB Left-Hand SLICEM Right-Hand SLICEL
contains four slices COUT COUT
Similar to the Virtex™-II
Slices are grouped in Slice X1Y1
pairs
Left-hand SLICEM Slice X1Y0
SHIFTIN
(Memory) Switch
Matrix
LUTs can be
configured as memory Slice X0Y1
or SRL16
Right-hand SLICEL
(Logic) Slice X0Y0 Fast Connects
LUT can be used as
logic only CIN CIN
SHIFTOUT
Programmable Logic Devices
69
Spartan-3E Features
More gates per I/O than 16 BUFGMUXes on
Spartan-3
left and right sides
Removed some I/O
standards Drive half the chip
Higher-drive LVCMOS only
GTL, GTLP In addition to eight
SSTL2_II global clocks
HSTL_II_18, HSTL_I,
HSTL_III Pipelined multipliers
LVDS_EXT, ULVDS Additional
DDR Cascade configuration modes
Internal data is
presented on a single SPI, BPI
clock edge Multi-Boot mode
Programmable Logic Devices
70
Virtex-II Pro Features
0.13 micron process
Up to 24 RocketIO™ Multi-Gigabit Transceiver (MGT)
blocks
Serializer and deserializer (SERDES)
Fibre Channel, Gigabit Ethernet, XAUI, Infiniband compliant
transceivers, and others
8-, 16-, and 32-bit selectable FPGA interface
8B/10B encoder and decoder
PowerPC™ RISC processor blocks
Thirty-two 32-bit General Purpose Registers (GPRs)
Low power consumption: 0.9mW/MHz
IBM CoreConnect bus architecture support
Programmable Logic Devices
71
Outline
Overview
Slice Resources
I/O Resources
Memory and
Clocking
Spartan-3, Spartan-
3E, and Virtex-II
Pro Features
Virtex-4 Features
Summary
Programmable Logic Devices
72
Virtex-4 Architecture Has the Most
Advanced Feature Set
RocketIO™
Multi-Gigabit Smart RAM
New block RAM/FIFO
Transceivers
622 Mbps–10.3 Gbps
Xesium Clocking
Advanced CLBs Technology
200K Logic Cells 500 MHz
Tri-Mode
Ethernet MAC
XtremeDSP™ 10/100/1000 Mbps
Technology Slices
256 18x18 GMACs
1 Gbps SelectIO™
PowerPC™ 405 ChipSync™ Source synch,
with APU Interface XCITE Active Termination
450 MHz, 680 DMIPS
Programmable Logic Devices
73
Choose the Platform that Best Fits
Application
LX FX SX
Resource
Logic 14K–200K LCs 12K–140K LCs 23K–55K LCs
Memory 0.9–6 Mb 0.6–10 Mb 2.3–5.7 Mb
DCMs 4–12 4–20 4–8
DSP Slices 32–96 128–512
32–192
SelectIO
240–960 240–896 320–640
RocketIO
N/A 0–24 Channels N/A
PowerPC
N/A N/A
1 or 2 Cores
Ethernet MAC
N/A N/A
2 or 4 Cores
Programmable Logic Devices
74
Outline
Overview
Slice Resources
I/O Resources
Memory and
Clocking
Spartan-3, Spartan-
3E, and Virtex-II Pro
Features
Virtex-4 Features
Summary
Programmable Logic Devices
75
Review Questions
List the primary slice features
List the three ways a LUT can be
configured
Programmable Logic Devices
76
Answers
List the primary slice features
Look-up tables and function generators (two per
slice, eight per CLB)
Registers (two per slice, eight per CLB)
Dedicated multiplexers (MUXF5, MUXF6, MUXF7,
MUXF8)
Carry logic
MULT_AND gate
List the three ways a LUT can be configured
Combinatorial logic
Shift register (SRL16CE)
Distributed memory
Programmable Logic Devices
77
Summary
Slices contain LUTs, registers, and carry logic
LUTs are connected with dedicated multiplexers
and carry logic
LUTs can be configured as shift registers or
memory
IOBs contain DDR registers
SelectIO™ standards and DCI enable direct
connection to multiple I/O standards while reducing
component count
Virtex™-II memory resources include the following:
Distributed SelectRAM™ resources and distributed
SelectROM (uses CLB LUTs)
18-kb block SelectRAM resources
Programmable Logic Devices
78
Summary
The Virtex™-II devices contain dedicated
18x18 multipliers next to each block
SelectRAM™ resource
Digital clock managers provide the
following:
Delay-Locked Loop (DLL)
Digital Frequency Synthesizer (DFS)
Digital Phase Shifter (DPS)
Programmable Logic Devices
79
Where Can I Learn More?
User Guides
www.xilinx.com Documentation User Guides
Application Notes
www.xilinx.com Documentation Application
Notes
Education resources
Designing with the Virtex-4 Family course
Spartan-3E Architecture free Recorded e-Learning
Programmable Logic Devices
80
Questions??
Programmable Logic Devices
81