0% found this document useful (0 votes)
15 views6 pages

Tarver 2019

The document discusses using neural networks for digital predistortion to correct nonlinearities in power amplifiers. It proposes a novel neural network training method that avoids indirect learning architectures and shows improvements in adjacent channel leakage ratio and error vector magnitude. It also shows that a neural network based predistorter can achieve lower latency, higher throughput and fewer multiplications per sample compared to a similarly performing memory polynomial implementation on an FPGA.

Uploaded by

KILANI Mounir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views6 pages

Tarver 2019

The document discusses using neural networks for digital predistortion to correct nonlinearities in power amplifiers. It proposes a novel neural network training method that avoids indirect learning architectures and shows improvements in adjacent channel leakage ratio and error vector magnitude. It also shows that a neural network based predistorter can achieve lower latency, higher throughput and fewer multiplications per sample compared to a similarly performing memory polynomial implementation on an FPGA.

Uploaded by

KILANI Mounir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2019 IEEE International Workshop on Signal Processing Systems

Design and Implementation of a Neural Network


Based Predistorter for Enhanced Mobile Broadband
Chance Tarver∗ , Alexios Balatsoukas-Stimming†‡ , and Joseph R. Cavallaro§
∗§ Departmentof Electrical and Computer Engineering, Rice University, Houston, TX, USA
† Department of Electrical Engineering, Ecole polytechnique fédérale de Lausanne Lausanne, Switzerland
‡ Department of Electrical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands

Email: ∗ tarver@rice.edu, † a.k.balatsoukas.stimming@tue.nl, § cavallar@rice.edu

Abstract—Digital predistortion is the process of using digital DPD coefficients, it has been shown to converge to a biased
signal processing to correct nonlinearities caused by the analog solution due to noise in the PA output [8, 9]. Moreover, the
RF front-end of a wireless transmitter. These nonlinearities LS problem is often poorly conditioned [4]. In [10], a mobile
contribute to adjacent channel leakage, degrade the error vector
magnitude of transmitted signals, and often force the transmitter graphics processing units (GPU) was used to implement the
to reduce its transmission power into a more linear but less polynomial DPD with I/Q imbalance correction from [4]. This
power-efficient region of the device. Most predistortion tech- GPU implementation used floating-point and was able to avoid
niques are based on polynomial models with an indirect learning the challenges associated with the dynamic range requirements
architecture which have been shown to be overly sensitive to for memory polynomials. When implemented on an FPGA, a
noise. In this work, we use neural network based predistortion
with a novel neural network training method that avoids the memory polynomial can be challenging due to the bit-widths
indirect learning architecture and that shows significant improve- that are necessary to perform the high-order exponentiation in
ments in both the adjacent channel leakage ratio and error vector fixed-point precision [11].
magnitude. Moreover, we show that, by using a neural network The overall DPD challenge has strong similarities to the
based predistorter, we are able to achieve a 42% reduction in problems encountered in in-band full-duplex (IBFD) commu-
latency and 9.6% increase in throughput on an FPGA accelerator
with 15% fewer multiplications per sample when compared to a nications [12–14], where a transceiver simultaneously trans-
similarly performing memory-polynomial implementation. mits and receives on the same frequency, increasing the
Index Terms—Digital predistortion, neural networks, FPGA. spectral efficiency of the communication system. However,
this requires (among other techniques) digitally removing the
I. I NTRODUCTION significant self-interference from the received signal which
not only consists of the intended transmission but also the
Efficiently correcting nonlinearities in power amplifiers
nonlinearities added by the imperfections in the transmit chain
(PAs) through digital predistortion (DPD) is critical for en-
including the PA. In [15], the author used neural networks
abling next-generation mobile broadband where there may be
(NNs) to perform the self-interference cancellation and found
multiple radio frequency (RF) transmit (TX) chains arranged
that it could achieve similar performance to polynomial based
to form a massive multiple-input multiple-output (MIMO)
self-interference cancellation. This work was later extended
system [1], as well as new waveforms with bandwidths on the
to create both FPGA and ASIC implementations of the NN-
order of 100 MHz in the case of mmWave communications [2].
based self-interference canceller [16]. It was found that, due
Traditional DPDs use variations of the Volterra series [3], such
to the regular structure of the NN and the lower bit-width
as memory polynomials [4, 5]. These models consist of sums
requirements, it can be implemented to have both a higher
of various order polynomials and finite impule responce (FIR)
throughput and lower resource utilization.
filters to model the nonlinearities and the memory effects in a
Inspired by the full-duplex NN work and the known prob-
PA, respectively.
lems of polynomial based predistortion with an ILAs, we
To learn the values of the parameters in a polynomial based
recently proposed in [17] to use NNs for the forward DPD
model, an indirect learning architecture (ILA) is typically used
application. The NNs are a natural choice for such application
in conjunction with some variation of a least squares (LS) fit
as they are able to approximate any nonlinear function [18],
of the data to the model [5]. In an ILA, a postinverse model
making them a reasonable candidate for predistortion. The idea
of the predistorter is fitted based on the output of the PA [6,
of using various NNs for predistortion has been explored in
7]. After learning the postinverter, the coefficients are copied
many works [19, 20]. However, the training method is unclear
to the predistorter. Although this simplifies the learning of
in [19], and their implementations require over ten thousand
The work of C. Tarver and J. R. Cavallaro was supported in part by the parameters. In [20], the training of the NN is done using an
U.S. NSF under grants ECCS-1408370, CNS-1717218, and CNS-1827940, ILA which can subject the learned predistorter to the same
for the “PAWR Platform POWDER-RENEW: A Platform for Open Wireless problems seen with all ILAs.
Data- driven Experimental Research with Massive MIMO Capabilities.” The
work of A. Balatsoukas-Stimming was supported by the Swiss NSF project Contribution: In our previous work [17], we avoided the
PZ00P2 179686. standard ILA and we improved the overall performance by

978-1-7281-1927-4/19/$31.00 ©2019 IEEE 296

Authorized licensed use limited to: CMU Libraries - library.cmich.edu. Downloaded on July 08,2020 at 03:53:10 UTC from IEEE Xplore. Restrictions apply.
Input Hidden Output
x̂[n]
x[n] NN DPD, Ĥ −1 PA, H y[n] layer layers layer

(x) (y)
1
Training N

...
G
(x) (y)
PA NN Model, Ĥ

Figure 2. General structure of the DPD and PA neural networks. There are
Figure 1. Architecture of the NN DPD system. The signal processing two input and output neurons for the real and imaginary parts of the signal, N
is done in the digital baseband and focuses on PA effects. The DAC, neurons per hidden layer, and K hidden layers. The inputs are directly added
up/downconverters, and ADC are not shown in this figure, though their to the output neurons so that the hidden layers concentrate on the nonlinear
impairments are also captured. portion of the signal.

using a novel training algorithm where we first modeled the learn this relationship given training data, this turns out to
PA with a NN and then backpropagated through it to train a be difficult in practice [15]. As such, we implement a linear
DPD NN. We extend that work here to show that not only do bypass in our NN that directly passes the inputs to the output
we improve performance when compared to polynomial based neurons where they are added in with the output from the
DPD, but we do so with reduced implementation complexity. final hidden layer, as can be seen in Fig. 2. This way, the NN
Furthermore, to realize the gains of the NN DPD, we design entirely focuses on the nonlinear portion of the signal.
a custom FPGA accelerator for the task and compare it to our
own polynomial DPD accelerator. B. Training
Outline: The rest of the paper is organized as follows. In This work primarily focuses on the implementation and
Section II, we give an overview of our DPD architecture and running complexity of the DPD application, which consists of
methods. In Section III, we compare performance/complexity inference on a pre-trained NN. The training is assumed to be
tradeoffs for the DPD NN to polynomial based predistorters. In able to run offline and, once the model is learned, significant
Section IV, we compare FPGA implementations for memory updates will not be necessary and occasional offline re-training
polynomial and NN predistortion. Finally, in Section V we to account for long-term variations would be sufficient.
conclude the paper. In [17], we first use input/output data of the PA to train
a NN to model the PA behavior. We then connect a second
II. N EURAL N ETWORK DPD A LGORITHM OVERVIEW
DPD NN to the PA NN model. We treat the combined DPD
For the NN DPD system, we seek to place a NN based NN and PA NN as one large NN. However, during the second
predistorter inline with the PA so that the cascade of the two training phase, we only update the weights corresponding to
is a linear system, as shown in Fig. 1. However, to train a the DPD NN. We then connect the DPD NN to the real PA
NN, it is necessary to have training data, and in this scenario and use it to predistort for the actual device.
the ideal NN output is unknown; only the ideal PA output is The process of predistorting can excite a different region
known. To overcome this problem, we train a PA NN model to of the PA than when predistortion is not used. To account
emulate the PA. We then backpropagate the mean squared error for this, it is not uncommon in other DPD methods to have
(MSE) through the PA NN model to update the parameters in multiple training iterations. A similar idea is adopted in [17]
the NN DPD [17]. and in this work. Once training of the PA and the DPD is
performed, we then retransmit through the actual PA while
A. Neural Network Architecture
using the DPD NN. Using the new batch of input/output data,
We use a feed-forward NN that is fully-connected with we then can update the PA NN model and in turn refine the
K hidden layers, and N neurons per hidden layer. The DPD NN. An example of the iterative training procedure is
nonlinear activation applied in hidden layers is chosen to be shown in Fig. 3, where the MSE training loss is shown for
a rectified linear unit (ReLU), shown in (1), which can easily the PA NN model and the combined DPD-PA is shown for
be implemented with a single multiplexer in hardware. two training iterations.
ReLU(x) = max(0, x) (1) III. C OMPLEXITY C OMPARISON
The input and output data to the predistorter is complex- To evaluate the NN based predistortion, we present the
valued, while NNs typically operate on real-valued data. To formulation of both a memory polynomial and the NN. We
accommodate this, we split the real and imaginary parts of then derive expressions for the number of multiplications as a
each time-domain input sample, x(n), on to separate neurons. function of the number of parameters in the models. In most
Although PA-induced nonlinearities are present in the trans- implementations, multiplications are considered to be more
mitted signal, the relationship between the input and output expensive as they typically have higher latency and require
data is still mostly linear. Although in principle, a NN can more area and power. Additions typically have a minor impact

297

Authorized licensed use limited to: CMU Libraries - library.cmich.edu. Downloaded on July 08,2020 at 03:53:10 UTC from IEEE Xplore. Restrictions apply.
P 
 M Q 
 L
x̂(n) = αp,m x(n − m)|x(n − m)|p−1 + βq,l x∗ (n − l)|x∗ (n − l)|q−1 + c (2)
p=1, m=0 q=1, l=0
p odd q odd

NN MSE Training Loss tion, we get the following number of multiplications


P Q
1 1
0.02 PA NN Model nMUL, poly = 3nPAR, poly + (p + 5) + (q + 5) .
DPD–PA NN p=3,
2 q=3,
2
p odd q odd
0.02 (4)
Iteration 1 Iteration 2 Here, each complex coefficient accounts for three multipli-
MSE

0.01 cation. The expression, x(n)|x(n)|p−1 is computed once for


each n over a given p and delayed in the design to generate
the appropriate value for each m. We note that |x(n)|p−1 can
2 p−1
0.01 always be simplified to ((x(n))2 + (x(n) ) 2 since p is
odd. This accounts for ( p−1
2 + 1) multiplications before being
multiplied by the complex-valued x(n) which adds 2 more
0
multiplications. The same is true for the conjugate processing.
0 10 20 30 40 50
B. Neural Network Predistortion
Epoch
The output of a densely connected NN is given by
   
(x(n))
h1 (n) = f W1 + b1 , (5)
Figure 3. Example of iterative NN-DPD training for two training iterations, (x(n))
where 20 and 5 epochs are used in the first and second iteration, respectively.

hi (n) = f (Wi hi−1 (n) + bi ) , i = 2, . . . , K, (6)


on these metrics when compared to multiplications, so we omit  
(x(n))
them from this high-level analysis. z(n) = WK+1 hK (n) + bK+1 + Wlinear , (7)
(x(n))

x̂(n) = z1 (n) + 1j · z2 (n), (8)


A. Memory Polynomial Predistortion
where f is a nonlinear activation function (such as the ReLU
An extension of a memory polynomial from [4] is shown from (1)), Wi and bi are weight matrices and bias vectors
in (2). This form of memory polynomial predistorts the corresponding to the ith layer in the NN, and j is the imaginary
complex baseband PA input x(n) to be x̂(n) by computing unit. The final output of the network after hidden layer K
nonlinearities of the form x(n)|x(n)|p and convolving them is given by (7) where the first element represents the real
with an FIR filter for both x(n) and its conjugate, x∗ (n). This part of the signal, and the second element represents the
conjugate processing gives the model the expressive power to imaginary part. In (7), Wlinear is a 2×2 matrix of the weights
combat PA nonlinearities and any IQ imbalance in the system. corresponding to the linear bypass. In practice, we fix it to
P and M are the highest nonlinearity order and memory depth be the identity matrix, I2 , to reduce complexity though these
in the main branch, while Q and L are the highest order weights could also be learned in systems with significant IQ
and memory in the conjugate branch. The complex-valued imbalance.
coefficients αp,m and βq,l represent the DPD coefficients that Assuming N neurons per hidden layer and K hidden layers,
need to be learned for nonlinearity orders p and q and memory the number of multiplications is given by
tap m and l. Finally, the DC term c accounts for any local
oscillator leakage in the system. nMUL, NN = 4N + (K − 1)N 2 . (9)
The total number of complex-valued parameters in (2) is C. Results
given as The performance results for each predistorter as a func-
    tion of the number of required multiplications are shown in
P +1 Q+1 Figs. 4–6. These results were obtained using the RFWebLab
nPAR, poly = M +L + 1. (3)
2 2 platform [21]. RFWebLab is a web-connected PA at Chalmers
University. This system uses a Cree CGH40006-TB GaN PA
Assuming three real multiplications per complex multiplica- with a peak output power of 6 W. The precision is 14 bits

298

Authorized licensed use limited to: CMU Libraries - library.cmich.edu. Downloaded on July 08,2020 at 03:53:10 UTC from IEEE Xplore. Restrictions apply.
K=1 K=1
−30 3
K=2 K=2
M =1 M =1
M =2 M =2
ACLR (dB)

EVM (%)
M =4 M =4
−31 2.5

−32
2

0 20 40 60 80 100 120 0 20 40 60 80 100 120


Number of Real Multiplications Number of Real Multiplications

Figure 4. ACLR vs. number of multiplications for NN DPD (shown with Figure 5. EVM vs. number of real multiplications for NN DPD (shown with
diamonds) with up to K = 2 hidden layers and memory polynomial (shown diamonds) with up to K = 2 hidden layers and memory polynomial (shown
with circles) with up to M = 4 memory taps. This represents the out-of- with circles) with up to M = 4 memory taps. This represents the in-band
band performance of the predistorter. The stars represent design points that performance of the predistorter. The stars represent design points that we
we implement in FPGA in the next section. implement in FPGA in the next section

for the feedback on the ADC and 16 bits for the DAC. No DPD
Using their M ATLAB API, we test the NN predistorter using 0 P =9
a 10 MHz OFDM signal. This signal has random data on N = 20
600 subcarriers spaced apart by 15 kHz and is similar to LTE
PSD (dB)

signals commonly used in cellular deployments. It provides


an interesting test scenario in that it has a sufficiently high −20
peak-to-average power ratio (PAPR) to make predistortion
challenging. We train on 10 OFDM symbols then validate
and present experimental results based on averaging over 10
−40
different symbols. The Adam optimizer is used with an MSE
loss function and batches of 32 samples. ReLU activation
functions are used in the hidden layer neurons. −40 −20 0 20 40
Specifically, we tested the following DPDs: (1) a NN DPD Frequency (MHz)
with K = 1 with N = {1, ..., 20, 25, 31} (dark green), (2) a
NN DPD with K = 2 with N = {1, ..., 8} (light green), (3) a Figure 6. Example spectrum for the M = 4 polynomial and K = 1 NN.
polynomial DPD without memory and with P = 1 to P = 13 Each of these use around 80 multiplications per time-domain input sample to
the DPD.
(dark blue), (4) a polynomial DPD with M = 2 memory taps
and with P = 1 to P = 13 (light blue), and (5) a polynomial
DPD with M = 4 memory taps and with P = 1 to P = 13 where Pchannel is the signal power in the main channel, and
(pink). For each of these polynomials, no conjugate processing Padjacent is the signal power in the remainder of the band.
was used hence Q, L = 0. A predistorter with M = L = 4 In Fig. 4, we observe that the NN DPD offers similar perfor-
and Q = P was also evaluated. However, the system did not mance to the memoryless polynomial DPD for low numbers
have significant IQ imbalance, so the addition of the conjugate of multiplications and it is able to significantly outperform all
processing to the memory polynomial only had the effect of polynomial DPDs as the number of multiplications increases.
significantly increasing complexity. All DPDs were evaluated 2) In-band performance: Although the primary goal of
in terms of the adjacent channel leakage ratio (ACLR), the predistortion is to reduce spectral regrowth around the main
error vector magnitude (EVM), and the spectra of the post-PA carrier, predistortion also reduces the EVM of the main signal.
pre-distorted signals. Reducing EVM can improve reception quality and is hence a
1) Out-of-band performance: To measure the out-of-band desirable result. The EVM is computed as
performance, which is often the metric of most interest given
by Federal Communications Commission (FCC) regulations ŝ − s
EVM = × 100%, (11)
and 3GPP standards, we compute the ACLR shown below as s
Padjacent where s is the vector of all original symbols mapped onto
ACLR = 10 log10 , (10)
Pchannel complex constellations on OFDM subcarriers in the frequency

299

Authorized licensed use limited to: CMU Libraries - library.cmich.edu. Downloaded on July 08,2020 at 03:53:10 UTC from IEEE Xplore. Restrictions apply.
domain, ŝ is the corresponding received vector after passing Weights and
through the PA, and · represents the 2 norm. Biases RAM
In Fig. 5, we see the EVM versus the number of mul-
tiplications for each of the predistorters. As the number of (x[n]) PE PE Add (x̂[n])
multiplications increases, the EVM decreases, as expected.
The memoryless polynomial DPD is able to achieve a low (x[n]) PE PE Add (x̂[n])
EVM for the smallest number of multiplications. However,

...
the complexity is only slightly higher for the NN based DPD,
which is able to achieve an overall better performance than all PE
other examined polynomial DPDs.
Linear Bypass
3) Spectrum Comparison: The spectrum for both the mem-
Pipeline Registers
ory polynomial and the NN DPDs are shown in Fig. 6. Here,
both predistorters have the same running complexity of 80
multiplications per time-domain input sample. However, the
Figure 7. General structure of the NN FPGA implementation.
NN is able to provide an additional 2.8 dB of suppression at
±20 MHz.
ReLU
(x[n]) Multiply Add h1,i (n)
IV. FPGA A RCHITECTURE OVERVIEW Mux
In this section, we compare a NN DPD accelerator with Weights
a memory polynomial based implementation. We implement From RAM Add
Cache
both designs in Xilinx System Generator and target for the
Zynq UltraScale+ RFSoC ZCU1285 evaluation board. For the (x[n]) Multiply
sake of this architecture comparison, we implement each to
be fully parallelized and pipelined as to compare the highest
throughput implementations of each. Based on the previous Figure 8. Example structure of a PE for the ith neuron in hidden layer 1.
analysis, we implement both with 16-bit fixed point precision
throughout.
We synthesize FPGA designs targeting two separate to that parameter. These registers output to the corresponding
ACLRs. First, we target an ACLR of approximately -31.4 dB. multiplier or adder.
This target is achieved with a NN with N = 6 neurons and An example neuron PE is shown in Fig. 8. Each PE
K = 1 hidden layer and a 7th order memoryless polynomial. is implemented with a sufficient number of multipliers for
Second, we target a more aggressive ACLR below -32 dB. performing the multiplication of the weights by the inputs in
This is done with a NN with N = 14 neurons and K = 1 parallel. The results from each multiplier are added together,
hidden layer. A memory polynomial with M = 2 and P = 11 along with the bias and passed to the ReLU activation function,
is also used to achieve this. which is implemented with a single multiplexer.

A. Neural Network Accelerator B. Polynomial Accelerator


We implement the NN-DPD on FPGA with the goal of The memory polynomial is also implemented using 16 bits
realizing high throughput via maximum parallelization and throughout the design. We target the design for maximum
pipeling. The top-level overview of the design is shown in throughput by fully parallelizing and pipelining it so that a
Fig. 7. Here, each wire corresponds to a 16-bit bus. The real new time-domain input sample can streamed in each clock
and imaginary parts of the PA input signal stream in each clock cycle. The main overall structure of the design is shown in
cycle. Weights are stored in a RAM which can be written to Fig. 9. Each polynomial “branch” of the memory polynomial
from outside the FPGA design. After the RAM is loaded, the corresponding to nonlinear order p computes x(n)|x(n)|p−1
weights and biases are written to individual registers in the and there is a branch for each p in the design. This computation
neuron processing elements (PEs) which cache them for fast from each branch is passed to an FIR filter with complex taps.
access during inference. A chain of pipeline registers pass the Three multiplications are used for each complex multiplication
inputs to the output to be added to the output of the final layer. in each filter. A RAM is implemented to interface with some
After the weights are loaded into RAM, the RAM controller outside controller for receiving updated weights. Once the
loads each of the weights into a weights cache in each PE. coefficients α and β are loaded into the design, they can be
To do this, a counter increments through each address in moved from the RAM to registers near each multiply similarly
the RAM. The current address and the value at that address to the cache implemented in the NN design.
are broadcast to all neurons. Each address corresponds with
a specific weight or bias. Whenever the weights cache in C. Results
a neuron reads addresses corresponding to the weights and The Xilinx Vivado post-place-and-route utilization results
biases for its neuron, it saves the data into a register dedicated are shown in Table I. Overall, the NN-based design offers

300

Authorized licensed use limited to: CMU Libraries - library.cmich.edu. Downloaded on July 08,2020 at 03:53:10 UTC from IEEE Xplore. Restrictions apply.
DPD Coeffs. also be further reduced with pruning, and the accuracy could
RAM potentially be improved with retraining after quantization and
pruning.
FIR R EFERENCES
x[n] Pipeline Delays Sum x̂[n]
Filter [1] E. G. Larsson, O. Edfors, F. Tufvesson, and T. L. Marzetta, “Massive
MIMO for next generation wireless systems,” IEEE Commun. Mag.,
FIR vol. 52, no. 2, pp. 186–195, Feb. 2014.
x(n)|x(n)|2 [2] W. Roh et al., “Millimeter-wave beamforming as an enabling technology
Filter for 5G cellular communications: Theoretical feasibility and prototype
results,” IEEE Commun. Mag., vol. 52, no. 2, pp. 106–113, Feb. 2014.
...

...
[3] A. Zhu, M. Wren, and T. J. Brazil, “An efficient Volterra-based behav-
FIR ioral model for wideband RF power amplifiers,” in IEEE MTT-S Int.
x(n)|x(n)|P −1 Microw. Symp. Digest, vol. 2, June 2003, pp. 787–790 vol.2.
Filter
[4] L. Anttila, P. Handel, and M. Valkama, “Joint mitigation of power
amplifier and I/Q modulator impairments in broadband direct-conversion
transmitters,” IEEE Trans. Microw. Theory Techn., vol. 58, no. 4, pp.
Figure 9. General structure of the high-throughput, low-latency, memory 730–739, Apr. 2010.
polynomial FPGA implementation. [5] A. Katz, J. Wood, and D. Chokola, “The Evolution of PA Linearization:
From Classic Feedforward and Feedback Through Analog and Digital
Predistortion,” IEEE Microw. Mag., vol. 17, no. 2, pp. 32–40, Feb. 2016.
Table I [6] A. Balatsoukas-Stimming, A. C. M. Austin, P. Belanovic, and A. Burg.,
C OMPARISON OF P ERFORMANCE AND FPGA U TILIZATION “Baseband and RF hardware impairments in full-duplex wireless sys-
tems: experimental characterisation and suppression,” EURASIP Journal
ACLR: -31.4 dB ACLR: -32 on Wireless Commun. and Networking, vol. 2015, no. 142, 2015.
N =6 P =7 N = 14 P = 11 [7] D. Korpi, L. Anttila, and M. Valkama, “Nonlinear self-interference can-
Metric
K=1 M =1 K=1 M =2 cellation in MIMO full-duplex transceivers under crosstalk,” EURASIP
Num. of Params. 32 8 72 24 Journal on Wireless Comm. and Netw., vol. 2017, no. 1, p. 24, Feb.
LUT 379 539 688 1424 2017.
LUTRAM 16 120 16 224 [8] D. Zhou and V. E. DeBrunner, “Novel adaptive nonlinear predistorters
FF 538 991 1170 2730 based on the direct learning algorithm,” IEEE Trans. on Signal Process-
DSP 24 27 56 66 ing, vol. 55, no. 1, pp. 120–133, Jan. 2007.
Worst Neg. Slack (ns) 8.72 8.68 8.49 8.34 [9] R. N. Braithwaite, “A comparison of indirect learning and closed loop
Max. Freq. (MHz) 783 756 661 603 estimators used in dgital predistortion of power amplifiers,” in IEEE
Max. T/P (MS/s) 783 756 661 603 MTT-S Int. Microw. Symp., May 2015, pp. 1–4.
Latency (CC) 12 21 14 26 [10] K. Li et al., “Mobile GPU accelerated digital predistortion on a software-
defined mobile transmitter,” in IEEE Global Conf. on Signal and Inform.
Process. (GlobalSIP), Dec. 2015, pp. 756–760.
[11] M. Younes, O. Hammi, A. Kwan, and F. M. Ghannouchi, “An accurate
numerous advantages over the memory polynomial. Specifi- complexity-reduced “PLUME” model for behavioral modeling and digi-
cally, for the target of an ACLR less than -32 dB, the NN tal predistortion of RF power amplifiers,” IEEE Trans. on Ind. Electron.,
requires 48% of the lookup tables (LUTs), 42% of the flip- vol. 58, no. 4, pp. 1397–1405, Apr. 2011.
[12] M. Jain et al., “Practical, real-time, full duplex wireless,” in Proc. Int.
flops (FFs), and 15% reduction in the number of digital signal Conf. on Mobile Comput. and Netw. ACM, 2011, pp. 301–312.
processors (DSPs). In terms of timing, there is a 9.6% increase [13] M. Duarte, C. Dick, and A. Sabharwal, “Experiment-driven characteri-
in throughput with a 46% decrease in latency. These reductions zation of full-duplex wireless systems,” IEEE Trans. Wireless Commun.,
vol. 11, no. 12, pp. 4296–4307, Dec. 2012.
in utilization occur while also seeing improved ACLR. [14] D. Bharadia, E. McMilin, and S. Katti, “Full duplex radios,” in ACM
SIGCOMM, 2013, pp. 375–386.
V. C ONCLUSIONS [15] A. Balatsoukas-Stimming, “Non-linear digital self-interference cancel-
lation for in-band full-duplex radios using neural networks,” in IEEE
In this paper, we explored the complexity/performance Int. Workshop on Signal Processing Advances in Wireless Commun.
tradeoffs for a novel, NN based DPD and found that the NN (SPAWC), June 2018, pp. 1–5.
[16] Y. Kurzo, A. Burg, and A. Balatsoukas-Stimming, “Design and im-
could outperform memory polynomials and offered overall plementation of a neural network aided self-interference cancellation
unrivaled ACLR and EVM performance. Furthermore, we scheme for full-duplex radios,” in Asilomar Conf. on Signals, Systems,
implemented each on an FPGA and found that the regular and Comput., Oct. 2018, pp. 589–593.
[17] C. Tarver, L. Jiang, A. Sefidi, and J. Cavallaro, “Neural network DPD
matrix multiply structure in the NN based predistorter led to via backpropagation through a neural network model of the PA,” in
a lower latency design with less hardware utilization when Asilomar Conf. on Signals, Systems, and Comput., (to appear).
compared to a similarly performing polynomial-based DPD. [18] K. Hornik, “Approximation capabilities of multi-
layer feedforward networks,” Neural Networks, vol. 4,
This work opens up many avenues for future work. no. 2, pp. 251 – 257, 1991. [Online]. Available:
This work can be extended to also compare perfor- http://www.sciencedirect.com/science/article/pii/089360809190009T
mance/complexity tradeoffs for more devices with a wider [19] R. Hongyo, Y. Egashira, T. M. Hone, and K. Yamaguchi, “Deep neural
network-based digital predistorter for doherty power amplifiers,” IEEE
variety of signals, including different bandwidths and multiple Microw. and Wireless Compon. Letters, vol. 29, no. 2, pp. 146–148, Feb.
component carriers. It is also possible to include memory 2019.
cells such as recurrent neural networks (RNNs) in the NN to [20] M. Rawat and F. M. Ghannouchi, “Distributed spatiotemporal neural
network for nonlinear dynamic transmitter modeling and adaptive digital
account for memory effects. The NN is naturally well suited predistortion,” IEEE Trans. Instrum. Meas., vol. 61, no. 3, pp. 595–608,
for a GPU implementation which would be interesting in soft- Mar. 2012.
ware defined radio (SDR) systems. The NN complexity could [21] “RF WebLab.” [Online]. Available: http://dpdcompetition.com/rfweblab/

301

Authorized licensed use limited to: CMU Libraries - library.cmich.edu. Downloaded on July 08,2020 at 03:53:10 UTC from IEEE Xplore. Restrictions apply.

You might also like