Optimized Design of a Digital IQ Demodulator Suitable for
Adaptive Predistortion of 3rd Generation Base Station PAs
Chiheb Rebai, Haythem Ayari and Adel Ghazel Slim Boumaiza, Fadhel Ghannouchi
CIRTA’COM research Unit Intelligent RF Radio Laboratory
Ecole Supérieure des Communications de Tunis (SUP’COM) Electrical and Computer Engineering Department
2088 Cité Technologique des Communications, Tunis, Tunisie Schulich Scholl of Engineering, University of Calgary, Canada
chiheb.rebai@supcom.rnu.tn fghannou@ucalgary.ca
Abstract—This paper presents an optimized design of a high-speed fractional resampler. The implementation results are discussed in
digital IQ demodulator intended for the implementation of the section 6; finally section 7 concludes this paper.
feedback path of an adaptive base band pre-distorter (DPD).
Indeed, the optimization of the DPD linearization capability in II. PRINCIPLE OF THE ADAPTIVE BASE BAND DPD
terms of correction bandwidth and nonlinearity effects TECHNIQUE
minimization is directly related to the accuracy and speed of the IQ
demodulator. In this work, a digital IQ demodulator is designed, The Digital PreDistorter (DPD), designated in figure 1 as
optimized and implemented in a Xilinx FPGA device. This allowed f consists of the inverse function of the HPA nonlinear response
a high speed processing of about 200MHZ with a substantial g over the entire power dynamic range. As a result, the cascade of
reduction of the FPGA used gates. the DPD and HPA produces the desired linear response. As
mentioned in figure 1, the HPA response function, g, depends on the
I. INTRODUCTION instantaneous magnitude of its input signal, Vp. In other words, the
HPA is assumed to be memoryless. Similarly, the predistortion
With the rapid growth of high volume multimedia data transfer function, f, is made to be a function of the magnitude of the input
in wireless communication networks, spectral efficiency is becoming signal Vi. The circuitry that implements the predistortion function
more and more essential. For that, non-constant envelope (high peak- may be referred simply as the predistorter.
to-average ratio) digital modulation schemes (e.g. QPSK and QAM)
and advanced access techniques (e.g. OFDM) where adopted or
being deployed in many 2.5G and 3G wireless systems. In such
context the nonlinearities that typify High Power Amplifiers (HPA)
causes amplitude and phase distortions, inter-symbol interference,
and adjacent channel interference. These nonlinear effects affect
considerably the wireless link performance and make HPA linearity
a challenging design issue. To address this problem various Figure 1. Generic architecture of the DPD.
linearization techniques have been proposed in the literature, namely
feedback, feed-forward and pre-distortion. The implementation of To achieve a linear response out of the cascade of the predistorter
such technique could be performed either in the analogue or digital and HPA equation (1) has to be satisfied.
domain. In addition, they can be applied to narrow band (Feedback),
wideband (digital predistortion) and broadband (analogue RF f(|Vi|) g(|Vp|) = k (1)
predistortion, feed-forward) signals. Base band digital pre-distortion where k is a constant that designates the linearized HPA gain, and
is currently the more attractive technique. Indeed, as digital signal
processors (DSP) and Field programmable gate arrays (FPGA) Vp = Vi f(|Vi|) (2)
become faster, the real-time correction of distortion effects of HPA Substitution of (2) into (1) leads to
through the application of base band digital pre-distortion is
becoming an increasingly viable solution. f(|Vi|) g(|Vi f(|Vi|)|) = k (3)
In this paper, attention is focused on the FPGA design, The synthesis of the predistorter function f so that (3) is satisfied
optimization and implementation of the digital IQ demodulator can be performed iteratively by minimizing a cost function given in
which is required in the development of a hybrid FPGA and DSP (4):
based solution to compensate for the HPA distortions. This paper is
organized as follows: Section 2 presents the adaptive base band DPD J= |Ve(n) | 2 (4)
principles; the proposed system configuration is explained in section Where the error signal is defined as:
3; then, the architecture of the I/Q digital demodulator is detailed in
section 4; section 5 is focused on the design of the high speed Ve (n) = Vo (n) – kVi (n) (5)
1-4244-0395-2/06/$20.00 ©2006 IEEE. 573
III. OVERALL SYSTEM DESCRIPTION
Figure 2 shows the block diagram of the predistorter. The high
speed processing block is running in the FPGA where input I/Q
samples are pre-processed in order to pre-compensate for the HPA
nonlinearity. The preditortion function implemented in the FPGA Figure 4. Synoptic of the one tone output DDS.
using a LUT. To update the LUT entries, as the HPA behavior
changes, a signal processing function that minimizes the cost According to figure 5, to deliver an output signal with fout =
function given in (6) is executed in the DSP. This function uses the 70.3125 MHz, the DDS’s ROM depth should be at least equal to
input and output of the HPA signals which are sampled and stored in 128 when fclk = 200 MHz and P = 45.
synchronous dynamic random access memories (SDRAM) [2].
Since, the input signal quadrature components (Ip, Qp) of the HPA 72
N =64
can be obtained in the digital domain prior to the digital modulation, 71.5
N =128
N =256
the predistortion function synthesis accuracy and the correction
bandwidth will be dominated by the accuracy and the sampling
DDS output Frequency (MHz)
71
speed of the feedback signal (Yf) which represents the output of the
HPA. In this work, a digital demodulation was chosen to obtain the 70.5
quadrature components of the feedback signal (Yf) in order to avoid
the impairment exhibited by radiofrequency IQ demodulator. Hence, 70
the design of the digital IQ demodulator is very critical for the digital
predistortion function synthesis and linearization performance. 69.5
Figure 3 is showing the details of the digital IQ demodulator that has
69
to be designed, optimized and implemented in the FPGA. This
component has to meet a high speed processing along with accurate 68.5
signal demodulation. 190 192 194 196 198 200 202 204
Sampling Frequency (MHz)
206 208 210
Figure 5. Spectrum of the DDS output.
Since the DDS have to generate both sine and cosine signals,
required by the I/Q multiplier, the needed ROM is duplicated (2 x
128) and the DDS power consumption is increased. Hence, an
optimization of the ROM surface would be advantageous.
Considering the sine function symmetry only /2J rad of the sine
function is stored as shown in figure 6. Hence, the generation of the
sine look-up table samples for the full range of 2 J rad is performed
using the quarter wave symmetry of the sine function.
Figure 2. Overall DPD system architecture.
Figure 6. Logic to exploit quarter-wave symmet.
In figure 6, the two most significant phase bits are used to
decode the quadrant, while the remaining k-2 bits are used to
Figure 3. FPGA block diagram.
address a /2J rad sine look-up table. The most significant bit
determines the required sign of the result, and the second most
significant bit determines whether the amplitude is increasing or
IV. DIGITAL I/Q DEMODULATOR DESIGN decreasing. The accumulator output is used “as is” for the first and
third quadrants. The bits must be complemented so that the slope of
A. NCO the saw tooth is inverted for the second and fourth quadrant. The
Figure 4 illustrates the simplified diagram of Direct Digital key factor of the DDS output signal purity is only the number of
Synthesis (DDS) devices that includes mainly; a phase accumulator output bit since there is not phase truncation. Thus, to have a
and a phase to amplitude converter (Sine ROM). The phase maximum carrier-to-spur level of 72 dB the signal should be at on
accumulator consists of a j-bit clock-dividing counter that is 12 bits.
incremented by the digital word P. The ROM is a 2j depth sine
look-up table, which converts the digital phase information B. I/Q Multiplier
delivered by the phase accumulator, into the values of a sine wave The I/Q multiplier is used to demodulate the IF signal by mixing
[5]. DDS’s parameters can be related according to the equation (6). it with the sine and cosine signals emerging from the one tone DDS.
P. f clk (6) In this work, the shift-and-add (CSA) algorithm is chosen for the
fout =
2j implementation of the multiplier. In addition, a pipelined structure
574
was adopted in order to reduce the propagation delay of the i = 0,1,...M 1
hi ( n ) = a ( i + Mr ) where (8)
multiplier, as shown in figure 7. Indeed, the number of intermediate
r = 0,1,... N M + i
product increase with the number of the operand’s bits which raises
the propagation delay of the multiplier and limits the operating The polyphase segments are accessed by delivering the input
frequency. samples x(n) to their inputs via an input commutation which starts
at the segment index i = M-1 and decrements to index 0. After the
commutation has executed one cycle and delivered M input samples
to the filter, a single output is taken as the summation of the outputs
from the polyphase segments. The output sample rate Fs’ is equal
Fs/M. Then, each polyphase segment is operating at the low output
sample rate Fs’ (compared to Fs).
B. Interpolator
Interpolation can be performed by padding zeros within original
samples [6]. The interpolation by factor L (L=2n) requires adding L-
1 zeros. The resulting images in the frequency spectrum can be
removed by a low-pass filter. As shown in figure 9, in this work, the
interpolation factor is chosen to be equal to 4. Hence, two
Figure 7. Architecture of the multiplier. interpolation blocks of factor 2 are required. To reduce power
consumption, the structure of the filter is optimized using the fact
V. FRACTIONAL RESAMPLER DESIGN
that the input of the LPF is null every L samples in order.
The objective of the fractional re-sampler is to convert the
signal rate from 200Msps to 160Msps needed to match the
sampling rates of all the input signals of main full state machine
(figure 3). This can be obtained by decimating the signal by 5 and
interpolate it by 4. In addition, a low pass filter (LPF) is used in
order to remove the upper band of the signal (@ 60 MHz) which is
produced by the multiplier
Figure 9. Interpolation by 4 structure.
A. Decimator
The decimator is used to reduce the sampling rate from 200 C. Canonical-Signed-Ddigit encoder implementation
MHz to 40 MHz (down sampling by a factor L =5). To reduce the A two’s complement representation is used for filter’s input and
sampling rate and avoid spectral aliasing, it is necessary to filter the the partial products. This requires the extension of the sign bit of the
input signal with a digital LPF which approximates the ideal partial products up to the most significant bit of the corresponding
characteristic [6]: intermediate result which leads to a higher power consumption. To
1, f < 20 MHz minimize the number of partial products to be accumulated in the
H ( e2 j f
)= (7) filter, all tap sets were encoded in canonical-signed-digit (CSD)
0, otherwise representation, reducing the number of non-zero-bits.
Practically, the design of a LPF with a cut-off frequency of 20
Since multiplication is the most computationally expensive
MHz and an out-of-band attenuation equal to 72 dB requires a high
operation in FIR filtering, simplifying the multiplication operations
order filter (54 tap FIR), which leads to a high power consumption.
is highly desirable for low-complexity design.
For the seek of design relaxation, cascading two lower-order LPFs
represents a valuable alternative. The first filter required 34 taps The CSD encoded coefficients can be built in using common
with a cut-off frequency of 25 MHz, while the second needs only 27 sub-expressions which were identified and shared between all taps.
taps with a rejection frequency of 15 MHz (no decimation is The common sub-expressions used as an alphabet to generate the
performed). different coefficients were 101 (x<<2+x), 10-1 (x<<2-x), 1001
(x<<3+x) and 100-1 (x<<3-x) as shown in figure 10. All filters in
The power consumption when using single decimation filter
this proposed solution are implemented in CSD format. The
(54-tap) is proportional to 54*Fclk while using two LPFs leads to a
resulting design is a multiplier free architecture. In order to more
total power consumption proportional to 35*Fclk + 27*Fclk/5 ;
increase the maximum working frequency of the decimating filter
40*Fclk. Thus, about 15% of power is being saved.
(to reach 200 MHz), a modified structure with 2 pipeline registers is
The structure of the first filter can be further optimized by using used as shown in figure 10.
polyphase decomposition as shown in figure 8. To illustrate this,
consider a set of N original filter coefficients a0, a1 … aN-1 are
mapped into the M (M=5) polyphase sub-filters h0(n), h1(n) … hM(n)
according to (8).
Figure 8. Polyphase decomposition. Figure 10. FIR CSD implementation.
575
VI. IMPLEMENTATION RESULTS VII. CONCLUSION
The full design is implemented on Xilinx VirtexII-Pro-P4 This paper proposed an optimized FPGA design of an IQ digital
component. Table I summarize the implementation results. It is demodulator. The achieved high speed (200Msps) and the low
worth noting that the required performances are reached: 200MHz complexity of this demodulator, when implemented in Xlinx FPGA,
for Decimator, 40MHz for interpolator and 200MHz for I/Q make it very appropriate for the implementation of the feedback
demodulator. The tested signal is a 3 channels WCDMA signal with path of an adaptive digital predistorter intended for the linearization
a chip rate of 3.84 Mcps, and a band-width of (5 MHz). Figures 11, of wideband power amplifiers.
12, 13, 14, 15 and 16 shows the spectrum analysis at different
stages. VIII. REFERENCES
TABLE I. IMPLEMEENTATION RESULTS [1] J. K. Cavers, "Amplifier Linearization Using a Digital Predistorter with
Fast Adaptation and Low Memory Requirements", IEEE Transaction on
Generic multiplier Proposed design
Block Parameters
implementation implementation
vehicular technology, Vol. VT-41, No.4, pp. 374-382, 1990.
[2] S. Boumaiza, J. Li, M. Jaidane and F. M. Ghannouchi, “ Adaptive
Decimator Slices / Flip Flops 802 / 851 1645 / 2884
Digital/RF Predistortion Using Direct LUT Synthesis and a Non-Uniform
4 input LUTs 826 2353 Indexing Function with Built-in Dependence on the Power Amplifier
Generic multiplier 35 0
Nonlinearity”, IEEE Trans. MTT, Vol. 52, Issue: 12, Dec 2004 , pp. 2670-
2677.
Frequency (MHz) 61.880 217.061
[3] C. Rebai, H. Zaoui, A. Ghazel, H. Ben Nasr, S. Boumaiza, F. Gannouchi,
Interpolator Slice / Flip Flops 132 / 166 812 / 875 “Embedded sofware implementation on an adaptive base band predistorter”
4 input LUTs 48 1051
ICECS 2005, december 2005, Tunis, Tunisia.
[4] C. Rebai, S. Labiadh, K. Grati, A. Ghazel, S. Boumaiza, F. Ghanouchi,
Generic multiplier 12 0
“FPGA bulding blocks for an hybrid base band digital prredistorter suitable
Frequency (MHz) 57.668 181.025 for 3G power amplifiers”, ICECS 2005, december 2005, Tunis, Tunisia.
I/Q demodulator Slice / Flip Flops 49 / 77 360 / 308 [5] R.C. Meitzler, W.P. Millard and J.Hopkins, “A Direct Digital Frequency
Synthesizer Prototype for Space Applications” University Applied Physics
4 input LUTs 18 587
Laboratory, 11100 Johns Hopkins Road Laurel, MD 20723 USA.
Generic multiplier 2 0 [6] “Interpolation and A Decimation of Digital Signals-Tutorial Review”
Frequency (MHz) 187.877 213.167 Proceeding of the IEEE, Vol. 69, N. 3, March 1981.
-20
-20 -20
-30 -30
-40
-40 -40
-50 -60
-50
Magnitude (dB)
Magnitude (dB)
Magnitude (dB)
-60 -60
-80
-70 -70
-80 -80 -100
-90 -90
-120
-100 -100
-110 -110 -140
0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 0 5 10 15 20 25 30 35 40
Frequency (MHz) Frequency (MHz) Frequency (MHz)
Figure 11. 3 channel WCDMA input signal. Figure 12. Spectrum after I/Q demodulation. Figure 13. Spectrum after first decimation filter.
-20 -20 -20
-40 -40 -40
-60 -60
-60
Magnitude (dB)
Magnitude (dB)
Magnitude (dB)
-80 -80
-80
-100 -100
-100
-120 -120
-120
-140 -140
-140 -160 -160
0 5 10 15 20 25 30 35 40 0 10 20 30 40 50 60 70 80 0 20 40 60 80 100 120 140 160
Frequency (MHz) Frequency (MHz) Frequency (MHz)
Figure 14. Spectrum after full decimation Figure 15. Spectrum after first interpolation Figure 16. Spectrum after full interpolation
procesing. filter. processing
576