Xapp790 7 Series Clock Gating
Xapp790 7 Series Clock Gating
Summary This application note introduces FPGA designers to intelligent clock gating by describing clock
gating support in the Xilinx design tools while supplying a detailed analysis of the impact of
clock gating on a design from a logic design and power perspective. Accessing and invoking
clock gating support in the Xilinx design tools flow and how to analyze the results is also
outlined.
Introduction Intelligent clock gating is a set of algorithms that can detect unnecessary switching in the
design and suppress it. This fully automated method adds a small amount of logic to suppress
and minimize nonessential activity in the design, which reduces the power consumed by the
design.
Power Terminology
There are two components of power—dynamic power and static power:
2
Dynamic Power = α × f CLK × C × V Equation 1
where:
• α is the activity (percentage) of a circuit that switches in each cycle
• fCLK is the frequency of the clock
• V is the supply voltage
• C is the capacitance
The device static power represents the transistor leakage power when the programmable
device is powered and not configured.
© Copyright 2012 Xilinx, Inc. Xilinx, the Xilinx logo, Artix, ISE, Kintex, Spartan, Virtex, Zynq, and other designated brands included herein are trademarks of Xilinx in the United
States and other countries. All other trademarks are the property of their respective owners.
EN0
EN1
CLK
CLK
Clock Gating in The Xilinx Design Suite supports clock gating from the pre-design stage to the post-route
Xilinx FPGAs stage. The Xilinx Power Estimator (XPE) spreadsheet tool (download at
http://www.xilinx.com/power) helps a new user understand the magnitude of power savings for
a particular design when power optimization is enabled in the Xilinx tools. In 7 series FPGAs
there are fundamentally eight flip-flops per slice that share a common clock enable. As shown
in Figure 2, the clock enable locally gates the clock and also stops the flip-flop from toggling.
The ISE® design tools can automatically suppress unnecessary switching by looking for cases
where flip-flop outputs will not be used by a downstream target. This is done by a
post-synthesis logic examination. The tool then generates local clock enables. Slice clock
gating does not re-synthesize the original design, it creates additional LUTs or uses existing
ones. However, these changes are incremental as the pre-existing logic is unchanged.
X-Ref Target - Figure 2
Canceled Transitions
EN EN
EN EN Equivalent
Output
• SIG does not influence output whenever it toggles • SIG only transitions when used
(= consumes unnecessary power) • CE is generated to control SIG actively
X790_02_071312
library IEEE;
use IEEE.std_logic_1164.all;
entity multiplexer is
port (CLK : in std_logic;
SEL_IN : in std_logic_vector (2 downto 0);
A_IN, B_IN, C_IN, D_IN : in std_logic_vector (7 downto 0);
E_IN, F_IN, G_IN, H_IN : in std_logic_vector (7 downto 0);
SIG : out std_logic_vector (7 downto 0));
end multiplexer;
architecture RTL of multiplexer is
signal SEL_INR, SEL : std_logic_vector(2 downto 0);
signal A, B, C, D, E, F, G, H : std_logic_vector(7 downto 0);
begin
-- purpose: register inputs
-- type : sequential
-- inputs : CLK
-- inputs : A_IN, B_IN, C_IN, D_IN, E_IN, F_IN, G_IN, SEL_IN
-- outputs: SIG
process (CLK)
begin -- process
if rising_edge(CLK) then
SEL_INR <= SEL_IN;
SEL <= SEL_INR;
A <= A_IN;
B <= B_IN;
C <= C_IN;
D <= D_IN;
E <= E_IN;
F <= F_IN;
G <= G_IN;
H <= H_IN;
A <= A_IN;
end if;
end process;
-- purpose: multiplexer 8-to-1
-- type : sequential
-- inputs : CLK, A, B, C, D, E, F, G, SEL
-- outputs: SIG
process (CLK)
begin -- process
if rising_edge(CLK) then
case SEL is
when "000" => SIG <= A;
when "001" => SIG <= B;
when "010" => SIG <= C;
when "011" => SIG <= D;
when "100" => SIG <= E;
when "101" => SIG <= F;
when "110" => SIG <= G;
when "111" => SIG <= H;
when others => SIG <= A;
end case;
end if;
end process;
end RTL;
Before: After:
8 8
A A
LUT CE
8 8
B B
8 LUT CE 8
Out Out
8 8
H H
LUT CE
3
3
3 3
SEL SEL
• The multiplexer described In RTL • Intelligent Clock Gating controls the CEs to reduce switching power
X790_03_061112
Power The XPE spreadsheet is a power estimation tool that is typically used in the pre-design and
Estimation pre-implementation phases of a project. The XPE tool assists application specific with
architecture evaluation, device selection, appropriate power supply components, and thermal
using the XPE management components. In addition, there are settings available in the tool to estimate the
Tool power in a power-optimized mode. These settings are used to determine the expected power
savings when using power optimization while implementing a design.
X790_04_061112
X790_05_061112
A LUT flip-flop pair for this architecture represents one LUT paired with
one flip-flop within a slice. A control set is a unique combination of
clock, reset, set, and enable signals for a registered element. The Slice
Logic Distribution report is not meaningful if the design is over-mapped
for a non-slice resource or if placement fails. OVERMAPPING of block RAM
resources should be ignored if the design is over-mapped for a non-block
RAM resource or if placement fails.
IO Utilization:
--------------
A LUT flip-flop pair for this architecture represents one LUT paired with
one flip-flop within a slice. A control set is a unique combination of
clock, reset, set, and enable signals for a registered element. The Slice
Logic Distribution report is not meaningful if the design is over-mapped
for a non-slice resource or if Placement fails. OVERMAPPING of block RAM
resources should be ignored if the design is over-mapped for a non-block
RAM resource or if placement fails.
IO Utilization:
--------------
=========================================================================
* Physical Synthesis Options Summary *
=========================================================================
---- Options
Global Optimization : OFF
Retiming : OFF
Equivalent Register Removal : OFF
Timing-Driven Packing and Placement : ON
Logic Optimization : OFF
Register Duplication : OFF
=========================================================================
=========================================================================
* Optimizations *
=========================================================================
---- Statistics
---- Details
CLK
A
SEL
OUT
A.Q
A toggles even
when it is not being
used downstream.
X790_06_071012
CLK
A
SEL
OUT
A.Q
A toggles only
when it needs to.
CE
X790_07_071012
XAPP790_08_061212
XAPP790_9_061212
Additional Intelligent clock gating optimization also reduces power for dedicated block RAM in either
Optimizations simple or dual-port mode. These blocks provide several enables; an array enable, a write
enable, and an output register clock enable. Most of the power savings comes from using the
array enable, as shown in Figure 10.
X-Ref Target - Figure 10
Before: After:
Address Address
Data In
CE
X790_10_061112
For example, in a block RAM followed by a 2-to-1 multiplexer, the optimization implements an
OR function in a LUT with the write enable (WER) and the select (PRESELECTR) and
connects them to the ENARDEN of the block RAM. The OR function ensures that the block
dissipates less power when no data is being written and when its output is not used (i.e., not
selected in the multiplexer). Assuming a 50% toggle rate on the write enable of the block RAM,
this optimization shows a 25% reduction in dynamic power. An example of the Verilog code is
shown:
`timescale 1ns/1ps
module ram_mux #(
parameter RAM_WIDTH = 8,
RAM_ADDR_WIDTH = 12
) (
input SELECT, CLK, WE,
input [RAM_WIDTH-1:0] BYPASS,
input [RAM_ADDR_WIDTH-1:0] ADDR,
input [RAM_WIDTH-1:0] DATA_IN,
output reg [RAM_WIDTH-1:0] RESULT_OUT = {RAM_WIDTH{1'b0}}
);
endmodule // ram_mux
In that same version of design tools, these optimizations can detect a clock enable
implemented as logic and replace it with a dedicated CE, as shown in Figure 11.
X-Ref Target - Figure 11
Before: After:
EN
EN CE
X790_11_061112
X790_12_061112
X790_13_061112
Power The recommended best practices for power savings using power optimization techniques are
Optimization described in this section.
Reg1
always @ (posedge clk)
Forgot to say what
if (rst)
happens to Reg2 when
reg1 <= 1’b0; RST R
reset
else
begin
reg1 <= a;
reg2 <= b; Consequence: RST Reg2
end steals Reg2’s CE
blocking power
optimization
CE
X790_14_071012
Block RAM
• The amount of power the block RAM consumes is directly proportional to the amount of
time it is enabled. To save power, the block RAM enable can be driven Low on clock cycles
when the block RAM is not used in the design. Both the block RAM enable rate and the
clock rate are important parameters that must be considered for power optimization.
• The NO_CHANGE mode should be used in the TDP mode if the output latches remain
unchanged during a write operation. This mode is the most power efficient. This mode is
not available in the SDP mode because it is identical in behavior to the WRITE_FIRST
mode.
Reference The reference design files for this application note can be downloaded from:
Design https://secure.xilinx.com/webreg/clickthrough.do?cid=190051
Table 1 shows the reference design checklist for this application note.
Conclusion As outlined in this application note and WP370, Reducing Switching Power with Intelligent
Clock Gating, FPGA designs can benefit from intelligent clock gating. Xilinx design tools readily
support these techniques and supply detailed analysis of the impact of clock gating on a design
from a logic design and power perspective.
Revision The following table shows the revision history for this document.
History
Date Version Description of Revisions
08/13/12 1.0 Initial Xilinx release.
Notice of The information disclosed to you hereunder (the “Materials”) is provided solely for the selection and use of
Xilinx products. To the maximum extent permitted by applicable law: (1) Materials are made available "AS
Disclaimer IS" and with all faults, Xilinx hereby DISCLAIMS ALL WARRANTIES AND CONDITIONS, EXPRESS,
IMPLIED, OR STATUTORY, INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY, NON-INFRINGEMENT, OR FITNESS FOR ANY PARTICULAR PURPOSE; and (2)
Xilinx shall not be liable (whether in contract or tort, including negligence, or under any other theory of
liability) for any loss or damage of any kind or nature related to, arising under, or in connection with, the
Materials (including your use of the Materials), including for any direct, indirect, special, incidental, or
consequential loss or damage (including loss of data, profits, goodwill, or any type of loss or damage
suffered as a result of any action brought by a third party) even if such damage or loss was reasonably
foreseeable or Xilinx had been advised of the possibility of the same. Xilinx assumes no obligation to
correct any errors contained in the Materials or to notify you of updates to the Materials or to product
specifications. You may not reproduce, modify, distribute, or publicly display the Materials without prior
written consent. Certain products are subject to the terms and conditions of the Limited Warranties which
can be viewed at http://www.xilinx.com/warranty.htm; IP cores may be subject to warranty and support
terms contained in a license issued to you by Xilinx. Xilinx products are not designed or intended to be
fail-safe or for use in any application requiring fail-safe performance; you assume sole risk and liability for
use of Xilinx products in Critical Applications: http://www.xilinx.com/warranty.htm#critapps.
Automotive XILINX PRODUCTS ARE NOT DESIGNED OR INTENDED TO BE FAIL-SAFE, OR FOR USE
Applications IN ANY APPLICATION REQUIRING FAIL-SAFE PERFORMANCE, SUCH AS APPLICATIONS
RELATED TO: (I) THE DEPLOYMENT OF AIRBAGS, (II) CONTROL OF A VEHICLE, UNLESS
Disclaimer THERE IS A FAIL-SAFE OR REDUNDANCY FEATURE (WHICH DOES NOT INCLUDE USE
OF SOFTWARE IN THE XILINX DEVICE TO IMPLEMENT THE REDUNDANCY) AND A
WARNING SIGNAL UPON FAILURE TO THE OPERATOR, OR (III) USES THAT COULD
LEAD TO DEATH OR PERSONAL INJURY. CUSTOMER ASSUMES THE SOLE RISK AND
LIABILITY OF ANY USE OF XILINX PRODUCTS IN SUCH APPLICATIONS.