Comparison of Parallel and Successive Interference
Cancellation for Non-Orthogonal Multiple Access
               Talgat Manglayev                       Refik Caglar Kizilirmak                           Yau Hee Kho
            Department of Electrical                   Department of Electrical                    School of Engineering
          and Computer Engineering                   and Computer Engineering                      and Computer Science
             Nazabayev University                       Nazabayev University                  Victoria University of Wellington
        Astana Z05H0P9, Kazakhstan                  Astana Z05H0P9, Kazakhstan                 Wellington 6140, New Zealand
         talgat.manglayev@nu.edu.kz                  refik.kizilirmak@nu.edu.kz                        yhkho@ieee.org
   Abstract—Non-orthogonal multiple access (NOMA) is a                   challenging decoding problem in NOMA networks, it has
promising method for the fifth generation (5G) cellular networks          some drawbacks. Firstly, it carries dependency on correct
as it provides improved spectral efficiency by multiplexing users         decoding during each iteration, i.e., an error made in one
in power domain. One key challenge for the receivers in NOMA
networks is to distinguish the individual signals that use the           iteration may propagate the other ones. Secondly, user signals
same band at the same time. Currently, the two widely dis-               have to wait to be decoded until the decoding of signals
cussed decoding schemes are successive interference cancellation         of previous users are complete. Another drawback of SIC is
(SIC) and parallel interference cancellation (PIC). Both schemes         that for correct decoding, power allocation between the user
suppress the multi-user interference by subtracting the decoded          signals should be accurately defined. Even though SIC with
signals from the received signal based on different algorithms, i.e.,
SIC decodes iteratively and PIC decodes collectively. This paper         NOMA is being promised, it remains impractical for massive
compares the computation time of SIC and PIC schemes at the              deployment.
base station and demonstrates multi-thread implementation of                In contrast, when PIC is employed, the receiver decodes
PIC.                                                                     each user signal at the same time in parallel. Its plain architec-
   Index Terms—5G, NOMA, SIC, PIC, Parallel programming,                 ture and ease of implementation overcome the aforementioned
GPU, CUDA
                                                                         deficiencies for SIC. In PIC, removing multi-user interference
                                                                         and decoding the signal are conducted in parallel [7] [8] [9].
                       I. I NTRODUCTION
                                                                            When we compare SIC and PIC schemes, it is clear that
   Developments in information technologies and growth of                decoding of each user signal in SIC is mutually dependant,
mobile devices put new requirements on wireless commu-                   while for PIC, decoding for each user is independent of each
nications. Major requirements put upon 5G are caused by                  other and cancellation tasks may be distributed to multi-core
popularity of internet of things and mobile internet. Modern             processors. Multi-core processors run programs concurrently
consumers of wireless communication services such as vir-                and independently of each other. One task may be split into
tual reality, augmented reality and e-health services demand             sub-tasks equal to number of cores and final results from each
enhanced broadband, massive machine-type communications,                 core are gathered. Results of sub-tasks must be independent
low latency and spectrum efficiency [1] [2] [3]. System re-              of each other so that they can be run separately.
sources are used to reach high spectrum efficiency with several             If number of users in a NOMA system grows, heavy
different multiple access schemes. These are multiple access             interference cancellations become time consuming. Such com-
schemes implemented in different standards; time division                plex computations may be optimized by running parallel sub
multiple access (TDMA), frequency division multiple access               tasks on graphical processing unit (GPU), rather than central
(FDMA), code division multiple access (CDMA) implemented                 processing unit (CPU). GPU is widely used to solve various
in the third generation (3G) and orthogonal frequency division           general purpose problems as it dedicates a larger fraction of
multiple access (OFDMA) used in the latest fourth generation             hardware resources to computation than CPU does [10].
(4G) networks.                                                              In this work, our aim is to compare computation time of
   In NOMA, which is a candidate multiple access scheme,                 SIC and PIC, and show that structure of PIC enables faster
signals are multiplexed in power domain where individual                 execution than SIC. In reaching our aim, we also demonstrate
signals are distinguished by their power levels. The receiver            multi-threaded implementation of PIC. Multi-threading allows
decodes the signals by either successive interference cancel-            us to use hardware more efficiently and run the work even
lation (SIC) or parallel interference cancellation (PIC) [4] [5]         faster. Multi-threading comparison is done on CPU with Java
[6].                                                                     programming language and on GPU with CUDA architecture.
   When SIC is employed, the receiver decodes user signals               The rest of the paper is organized as follows. In Section II
iteratively by subtracting each decoded signal from the re-              we introduce SIC and PIC decoders, then in Section III we
ceived signal. Although, SIC has a potential to solve the                present our numerical results and finally we conclude our work
978-1-5386-5928-1/18/$31.00 2018
                            c     IEEE                                  74
in Section IV.
                          II. S YSTEM M ODEL
   In this paper, we consider uplink transmission for a single
cell NOMA network. In the network, all users transmit signals
at the same time and in the same band simultaneously via
different paths. For K user equipments (UEs), the received
                                                                                             Fig. 1. Block diagram of a typical SIC scheme.
signal by the base station (BS) can be written as
                         K
                                     
                y(t) =         gk (    ak PT xk (t)) + n(t)                (1)                                      K
                         k=1                                                                     xk (t) = y(t) −       k=1 sˆi ,
                                                                                                                       k=i                   (2)
where gk is the channel attenuation coefficient between UE                                       i = 1, 2, ..., k − 1, k + 1, ..., K.
k and the BS, xk (t) is the modulated waveform transmitted
from UE k, ak is power coefficient for UE k assigned by the                         In (2), decoded messages for K UEs (except the desired kth
BS, PT is the overall power for signals of all UEs and n(t)                       UE) are summed and subtracted from the received signal y(t).
is the additive white Gaussian noise (AWGN) at the BS with                        Then the result is decoded and message for UE k is obtained.
zero mean and density N0 (W/Hz).                                                  As can be seen, this summing operations and then subtractions
                                                                                  may take place in parallel [16].
A. Power Allocation
   In SIC, order of decoding depends on the power allocation                                       III. N UMERICAL R ESULTS
between each UEs. In other words, the user signal with the                           In this section, we compare the computation time of PIC
highest received power is decoded first. In [11], spectral                        and SIC schemes for different number of UEs in a single cell
efficiency of the network is maximized by optimizing the                          NOMA network. We considered single carrier transmission
power allocation among UEs in NOMA uplink channels. In                            with QPSK and 16-QAM and maximum likelihood decoding
this work, we also optimize the power allocation coefficients                     so that higher modulation order requires higher decoding time.
αk ’s in decoding the received signals with SIC. The same                         Maximum available power for UEs PT = 23 dBm. Frequency
methodology described in [12] is followed. In contrast, in PIC,                   is 1 GHz and users are distributed randomly in the coverage
we assume that power is equally distributed among UEs and                         area of the cell. Channel gains are obtained with Okumura-
αk ’s are set equal to each other. 1                                              Hata propagation model. The noise density N0 is taken as
                                                                                  10−17 W/Hz. Calculations were done on a machine with
B. Successive Interference Cancellation
                                                                                  following hardware parameters: CPU Intel(R) Xeon(R) CPU
   Fig. 1 shows the block diagram of a SIC receiver. In SIC,                      E5620 @ 2.40GHz, 5GB RAM Memory 1333 MHz DDR3,
BS has the knowledge of power allocation coefficients (αk ’s)                     NVIDIA TITAN Xp with 3840 CUDA Cores and Boost Clock
and channel gains (gk ’s) for each UE. The first decoded signal                   of 1582 MHz. Simple tic toc MATLAB functions were used
by the BS would be the signal for UE 1 and the remaining part                     for results in Fig. 3. Version 8 of Java programming language
is considered as multi-user interference. BS then subtracts the                   for running PIC via streams in parallel threads on multi cores
decoded signal from the received signal and the signals of the                    in Fig. 4. Java allows running parallel tasks using threads,
rest UEs are decoded in the same manner. Iterations continue                      which may be assigned per processor [17]. Version 8 of Java
until signals of all UEs are decoded. Order of decoding is                        introduced streams and parallel streams, which enable working
defined by the power allocation among UEs [13] [14] [15].                         with collections of different object types and manipulate them
Power allocation coefficients are linearly dependent on the                       in parallel [18]. Finally, CUDA tool to measure elapsed time
distance from UEs to BS. The data rate for closer UEs and                         of events running on GPU device in Fig. 5.
with lower transmit power is less than for farther but with
higher transmit power [12].
C. Parallel Interference Cancellation
   Fig. 2 shows the block diagram of a PIC receiver. PIC
decodes for each UE from the received composite signal at
the same time. In order to cancel the multi-user interference,
BS then subtracts each estimated message from the received
signal as
   1 Power allocation for PIC-NOMA can also be optimized. However, it
should be noted that this work compares the computation time of SIC and
PIC schemes where the power allocation coefficients do not have any impact.
Still, we mention power allocation here since it is one of the most significant
factors in NOMA networks.                                                                    Fig. 2. Block diagram of a typical PIC scheme.
                   Second International Conference on Computing and Network Communications (CoCoNet’18)                                        75
   Fig. 3 shows the computation times of PIC and SIC as
the number of UEs increase. Dependency of decoding among                                             250
UEs makes SIC execution time even longer as the number                                                            SIC-QPSK
                                                                                                                  PIC-QPSK
of UEs grows, whereas PIC scheme decoding time changes                                                            SIC-QAM-16
                                                                                                     200
                                                                                                                  PIC-QAM-16
marginally. PIC execution time is twice faster than SIC ex-
ecution time for 1000 UEs and it reaches approximately 5
fold difference for 2000 UEs. Execution time of modulation                                           150
                                                                                         Time (ms)
schemes QPSK and QAM-16 do not differ much.
                                                                                                     100
                 12
                                SIC-QPSK
                                PIC-QPSK                                                                 50
                 10             SIC-QAM-16
                                PIC-QAM-16
                 8                                                                                       0
                                                                                                         500            1000              1500               2000          2500
     Time (ms)
                                                                                                                                      Number of UEs
                 6
                                                                                      Fig. 4. Computation time of SIC and PIC on multiple threads for QPSK and
                                                                                      16-QAM modulation schemes.
                 4
                 2                                                                                       10
                                                                                                         9
                                                                                                                   PIC-QPSK
                 0                                                                                                 SIC-QPSK
                      0   200     400    600    800   1000 1200 1400 1600 1800 2000                      8
                                                                                                                   PIC-16-QAM
                                                Number of UEs                                                      SIC-16-QAM
                                                                                                         7
Fig. 3. Computation time of SIC and PIC for QPSK and 16-QAM modulation                                   6
                                                                                             Time (ms)
schemes.
                                                                                                         5
   PIC is found suitable to be implemented on parallel threads                                           4
as discussed above. In Fig. 4, we implement PIC via Java
                                                                                                         3
8 using parallel streams on multi-cores and compare it with
SIC. As a result, execution time for both modulation schemes                                             2
is approximately constant and keep around 100 ms. As can be                                              1
seen, even if it took less time for SIC implementation until
                                                                                                         0
500 UEs, execution time of SIC scheme is as twice as PIC                                                      0   500          1000       1500        2000          2500   3000
when it reaches 2500 UEs. It may be predicted that number of                                                                          Number of UEs
UEs does not affect execution time for PIC on parallel threads.
   Fig. 5 shows SIC and PIC computation times on CUDA                                 Fig. 5. Computation time of SIC and PIC on GPU run with CUDA
                                                                                      architecture for QPSK and 16-QAM modulation schemes.
architecture with C++ programming language. Linear curves
show that SIC execution time grows faster than PIC. Mod-
ulation schemes QPSK and 16-QAM do not differ much in                                 increase. We also evaluated PIC execution time on multi-core
implementation time, only with slight faster performance of                           processor using multi-threading and results reveal that PIC
QPSK. For approximately 2500 UEs execution time reached                               outperforms SIC. Then we implemented multi-thread approach
9 ms for SIC against only 2 ms for PIC.                                               on GPU device with CUDA architecture, which allows initi-
                                               IV. C ONCLUSION                        ating GPU threads equal to number of UEs simultaneously.
                                                                                      PIC outperformed SIC and time difference increases with the
  In this paper, we compared the computation time of SIC                              number of UEs. Breaking the task into parallel streams allowed
and PIC decoding schemes in the uplink of NOMA based                                  us to use processor more efficiently.
cellular networks. Both schemes decode the received signal
by cancelling out interference in power domain. Decoding                                                                              R EFERENCES
approaches resulted in different computation performances.
SIC execution time depends on the number of UEs in the                                 [1] Z. K. Xiang Wei, 5G Mobile Communications. Springer, 2017.
network and increases gradually with the number of UEs. PIC                            [2] R. Vannithamby and S. Talwar, Towards 5G: Applications, Requirements
                                                                                           and Candidate Technologies. John Wiley & Sons, 2017.
has advantage of decoding the signal in parallel, therefore                            [3] B. M. J. Mavromoustakis Constandinos X., Mastorakis George, Internet
execution time grows moderately as the number of UEs                                       of Things (IoT) in 5G Mobile Technologies. Springer, 2016.
76                                      Second International Conference on Computing and Network Communications (CoCoNet’18)
 [4] Y. Saito, A. Benjebbour, Y. Kishiyama, and T. Nakamura, “System-level
     performance evaluation of downlink non-orthogonal multiple access
     (NOMA),” in Personal Indoor and Mobile Radio Communications
     (PIMRC), 2013 IEEE 24th International Symposium on, pp. 611–615,
     IEEE, 2013.
 [5] L. Dai, B. Wang, Y. Yuan, S. Han, I. Chih-Lin, and Z. Wang, “Non-
     orthogonal multiple access for 5g: solutions, challenges, opportunities,
     and future research trends,” IEEE Communications Magazine, vol. 53,
     no. 9, pp. 74–81, 2015.
 [6] A. Benjebbour, Y. Saito, Y. Kishiyama, A. Li, A. Harada, and T. Naka-
     mura, “Concept and practical considerations of non-orthogonal multiple
     access (NOMA) for future radio access,” in Intelligent Signal Processing
     and Communications Systems (ISPACS), 2013 International Symposium
     on, pp. 770–774, IEEE, 2013.
 [7] H. Yan and S. Roy, “Parallel interference cancellation for uplink mul-
     tirate overlay cdma channels,” IEEE transactions on communications,
     vol. 53, no. 1, pp. 152–161, 2005.
 [8] M. Zhang, S. Ahmed, and S. Kim, “Iterative mmse-based soft mimo
     detection with parallel interference cancellation,” IET Communications,
     vol. 11, no. 11, pp. 1775–1781, 2017.
 [9] N. Benvenuto and P. Bisaglia, “Parallel and successive interference
     cancellation for mc-cdma and their near-far resistance,” in Vehicular
     Technology Conference, 2003. VTC 2003-Fall. 2003 IEEE 58th, vol. 2,
     pp. 1045–1049, IEEE, 2003.
[10] T. M. Aamodt, W. W. L. Fung, and T. G. Rogers, “General-purpose
     graphics processor architectures,” Synthesis Lectures on Computer Ar-
     chitecture, vol. 13, no. 2, pp. 1–140, 2018.
[11] M. Al-Imari, P. Xiao, M. A. Imran, and R. Tafazolli, “Uplink non-
     orthogonal multiple access for 5g wireless networks,” in Wireless
     Communications Systems (ISWCS), 2014 11th International Symposium
     on, pp. 781–785, IEEE, 2014.
[12] T. Manglayev, R. C. Kizilirmak, and Y. H. Kho, “Optimum power
     allocation for non-orthogonal multiple access (NOMA),” in Application
     of Information and Communication Technologies (AICT), 2016 IEEE
     10th International Conference on, pp. 1–4, IEEE, 2016.
[13] D. Tse and P. Viswanath, Fundamentals of wireless communication.
     Cambridge university press, 2005.
[14] J. G. Andrews and T. H. Meng, “Optimum power control for succes-
     sive interference cancellation with imperfect channel estimation,” IEEE
     Transactions on Wireless Communications, vol. 2, no. 2, pp. 375–383,
     2003.
[15] T. Manglayev, R. C. Kizilirmak, Y. H. Kho, N. Bazhayev, and I. Lebedev,
     “NOMA with imperfect SIC implementation,” in Smart Technologies,
     IEEE EUROCON 2017-17th International Conference on, pp. 22–25,
     IEEE, 2017.
[16] A. Anwar, B.-C. Seet, and X. J. Li, “PIC-based receiver structure for
     5G downlink NOMA,” in Information, Communications and Signal
     Processing (ICICS), 2015 10th International Conference on, pp. 1–5,
     IEEE, 2015.
[17] P. Watcharawitch and S. Moore, “Jma: the java-multithreading ar-
     chitecture for embedded processors,” in Computer Design: VLSI in
     Computers and Processors, 2002. Proceedings. 2002 IEEE International
     Conference on, pp. 527–529, IEEE, 2002.
[18] Y. Chan, A. Wellings, I. Gray, and N. Audsley, “A distributed stream
     library for java 8,” IEEE Transactions on Big Data, vol. 3, no. 3,
     pp. 262–275, 2017.
                  Second International Conference on Computing and Network Communications (CoCoNet’18)   77