Preface
CONTENTS
1 Introduction
1-1 Elements of a Digital Communication System
1-2. Communication Channels.and Their Characteristics
1-3 Mathematical Models for Communication Channels
1-4 A Historical Perspective in the Development of Digital
Communications
1-5 Overview of the Book
1-6 Bibliographical Notes and References
2 Probability and Stochastic Processes
2-1 Probability .
2-1-1
2-1-2
2-1-3
21-4
21-5
2-1-6
Random Variables, Probability Distributions,
and Probability Densities
Functions of Random Variables
Statistical Averages of Random Variables
Some Useful Probability Distributions
Upper bounds on the Tail Probability
Sums of Random Variables and the Central Limit
Theorem
2-2 Stochastic Processes
22-1
2-2-2
22-3
2-2-4
22-5
226
Response of a Linear Time-Invariant System to a Random
Input Signal «
Sampling Theorem for Band-Limited Stochastic Processes
Discrete-Time Stochastic Signals and Systems
Cyclostationary Processes
23 Bibliographical Notes and References
Problems3 Source Coding
3-1
3-2
3-3
34
3-6
Mathematical Models for Information
A Logarithmic Measure of Information
3-2-1 Average Mutual Information and Entropy
3-2-2 Information Measures for Continuous Random Variables
Coding for Discrete Sources
3-3-1 Coding for Discrete Memoryless Sources
3-3-2 Discrete Stationary Sources
3-3-3. The Lempel-Ziv Algorithm
Coding for Analog Sources—Optimum Quantization
3-4-1 Rate-Distortion Function
3-4-2 Scalar Quantization
3-4-3 Vector Quantization
Coding Techniques for Analog Sources
3-5-1 Temporal Waveform Coding
3-5-2 Spectral Waveform Coding
3-5-3 Model-Based Source Coding
Bibliographical Notes and References
Problems
4 Characterization of Communication Signals
and Systems
4-1
4-2
43
45
Representation of Bandpass Signals and Systems
4-1-1 Representation of Bandpass Signals
4-1-2 Representation of Linear Bandpass Systems
4-1-3. Response of a Bandpass System to a Bandpass Signal
4-1-4 Representation of Bandpass Stationary Stochastic
Processes
Signal Space Representation
4-2-1 Vector Space Concepts
4-2-2 Signal Space Concepts
42-3 Orthogonal Expansions of Signals
Representation of Digitally Modulated Signals
4-3-1 Memoryless Modulation Methods
4-3-2 Linear Modulation with Memory
4-3-3 Nonlinear Modulation Methods with Memory
Spectral Characteristics of Digitally Modulated Signals
4-4-1 Power Spectsa of Linearly Modulated Signals
4-4-2 Power Spectra of CPFSK and CPM Signals
4-4-3 Power Spectra of Modulated Signals with Memory
Bibliographical Notes and References
Problems
5 Optimum Receivers for the Additive White
Gaussian Noise Channel
5-1
Optimum Receiver for Signals Corrupted by AWGN
5-1-1 Correlation Demodulator
5-1-2 Matched-Filter Demodulator
152
152
153
157
157
159
163
165
165
173
174
186
190
220
223
224
BEER5-2
5-3
5-4
5-5
5-6
5-1-3 The Optimum Detector
5-14 The Maximum-Likelihood Sequence Detector
5-1-5 A Symbol-by-Symbol MAP Detector for Signals
with Memory :
Performance of the Optimum Receiver for Memoryless
Modulation
5-2-1 Probability of Error for Binary Modulation
5-2-2 Probability of Error for M-ary Orthogonal Signals
5-23 Probability of Error for M-ary Biorthogonal Signals
5-2-4 Probability of Error for Simplex Signals
5-2-5 Probability of Error for M-ary Binary-Coded Signals
5-2-6 Probability of Error for M-ary PAM
5-2-7 Probability of Error for M-ary PSK
5-2-8 Differential PSK (DPSK) and its Performance
5-2-9 Probability of Error for QAM.
5-2-10 Comparison of Digital Modulation Methods
Optimum Receiver for CPM Signals
5-3-1 Optimum Demodulation and Detection of CPM
5-3-2 Performance of CPM Signals
5-3-3 Symbol-by-Symbol Detection of CPM Signals
Optimum Receiver for Signals with Random Phase in AWGN
Channel
5-4-1 Optimum Receiver.for Binary Signals
5-4-2 Optimum Receiver for M-ary Orthogonal Signals
5-4-3 Probability of Error for Envelope Detection of M-ary
Orthogonal Signals
5-4-4 Probability of Error for Envelope Detection of Correlated
Binary Signals
Regenerative Repeaters and Link Budget Analysis
5-5-1 Regenerative Repeaters
5-5-2. Communication Link Budget Analysis
Bibliographical Notes and References
Problems
6 Carrier and Symbol Synchronization
61
6-2
6-3
Signal Parameter Estimation
6-1-1 The Likelihood Function
6-1-2 Carrier Recovery and Symbol Synchronization
in Signal Demodulation
Carrier Phase Estimation
6-2-1 Maximum-Likelihood Carrier Phase Estimation
6-2-2 The Phase-Locked Loop
6-2-3 Effect of Additive Noise on the Phase Estimate
6-2-4 Decision-Directed Loops
6-2-5 Non-Decision-Directed Loops
Symbol Timing Estimation
6-3-1 Maximum-Likelihood Timing Estimation
6-3-2 Non-Decision-Directed Timing Estimation
312
313
314
316
319
320
333
333
335
336
337
339
343
347
350
358
359
3616-4 Joint Estimation of Carrier Phase and Symbol Timing 365
6-5 Performance Characteristics of ML Estimators 367
6-6 Bibliographical Notes and References 370
Problems a7
7 Channel Capacity and Coding 374
7-1 Channel] Models and Channel Capacity 375
7-1-1 Channel Models 375
7-1-2 Channel Capacity 380
7-1-3. Achieving Channel Capacity with Orthogonal Signals 387
7-1-4 Channel Reliability Functions 389
7-2. Random Selection of Codes 390
7-2-1 Random Coding Based on M-ary Binary-Coded Signals 390
7-2-2 Random Coding Based on M-ary Multiamplitude Signals 397
7-2-3. Comparison of R$ with the Capacity of the AWGN
Channel 399
7-3 Communication System Design Based on the Cutoff Rate 400
7-4 Bibliographical Notes and References 406
Probiems 406
8 Block and Convolutional Channel Codes 413
8-1 Linear Block Codes 413
8-1-1 The Generator Matrix and the Parity Check Matrix 417
8-1-2 Some Specific Linear Block Codes 421
8-1-3 Cyclic Codes 423
8-1-4 Optimum Soft-Decision Decoding of Linear Block Codes 436
8-1-5 Hard-Decision Decoding 445
8-1-6 Comparison of Performance between Hard-Decision and
Soft-Decision Decoding 456
8-1-7 Bounds on Minimum Distance of Linear Block Codes 461
8-1-8 Nonbinary Block Codes and Concatenated Block Codes 464
8-1-9 Enterleaving of Coded Data for Channels with Burst
Errors 468
8-2 Convolutional Codes 470
8-2-1 The Transfer Function of a Convolutional Code a7
8-2-2 Optimum Decoding of Convolimional Codes—
The Viterbi Algorithm 483
8-2-3. Probability of Error for Soft-Decision Decoding 486
8-2-4 Probability of Error for Hard-Decision Decoding 489
8-2-5 Distance Properties of Binary Convolutional Codes 492
8-2-6 Nonbinary Dual-k Codes and Concatenated Codes. 492
8-2-7 Other Decoding Algorithms for Convolutional Codes 500
8-2-8 Practical Considerations in the Application of
Convolutional Codes 506
8-3 Coded Modulation for Bandwidth-Constrained Channels Sit
8-4 Bibliographical Notes and References 526
Problems S289
10
Signal Design for Band-Limited Channels
9-1 Characterization of Band-Limited Channels
9-2 Signal Design for Band-Limited Channels
9-2-1 Design of Band-Limited Signals for No Intersymbol
Interference—The Nyquist Criterion
9-2-2 Design of Band-Limited Signals with Controlled ISI—
Partial-Response Signals
9-2-3 Data Detection for Controlled ISI
9-2-4 Signal Design for Channels with Distortion
9-3 Probability of Error in Detection of PAM
9-3-1 Probability of Error for Detection of PAM with Zero ISI
9-3-2 Probability of Error for Detection of Partial-Response
‘Signals
9-3-3 Probability of Error for Optimum Signals in Channel
with Distortion
9-4 Modulation Codes for Spectrum Shaping
9-5 Bibliographical Notes and References
Problems
Communication through Band-Limited Linear
Filter Channels
10-1 Optimum Receiver for Channels with ISI and AWGN
10-1-1 Optimum Maximum-Likelihood Receiver
10-1-2 A Discrete-Time Model for a Channel with ISI
10-1-3| The Viterbi Algorithm for the Discrete-Time White
Noise Filter Model
0-1-4 Performance of MLSE for Channels with [SI
10-2 Linear Equalization
10-2-1 Peak Distortion Criterion
{0-2-2 Mean Square Error (MSE) Criterion
10-2-3 Performance Characteristics of the MSE Equalizer
10-2-4 Fractionally Spaced Equalizer
10-3 Decision-Feedback Equalization
10-3-1 Coefficient Optimization
10-3-2 Performance Characteristics of DFE
10-3-3 Predictive Decision-Feedback Equalizer
10-4 Bibliographical Notes and References
Problems
Adaptive Equalization
11-1 Adaptive Linear Equalizer
1-1-1 The Zero-Forcing Algorithm
{1-1-2 The LMS algorithm
1-1-3 Convergence Properties of the LMS Algorithm
{1-1-4 Excess MSE Due to Noisy Gradient Estimates
11-1-5 Baseband and Passband Linear Equalizers
11-2 Adaptive Decision-Feedback Equalizer
11-2-1 Adaptive Equalization of Trellis-Coded Signals
576
SEZESESSR SSE12
13
14
11-3 An Adaptive Channel Estimator for ML Sequence Detection
11-4 Recursive Least-Squares Algorithms for Adaptive Equalization
11-4-1 Recursive Least-Squares (Kalman) Algorithm
11-4-2 Linear Prediction and the Lattice Filter
11-5 Self-Recovering (Blind) Equalization
11-5-1 Blind Equalization Based on Maximum-Likelihood
Criterion
11-5-2 Stochastic Gradient Algorithms
11-5-3. Blind Equalization Algorithms Based on Second-
and Higher-Order Signal Statistics
11-6 Bibliographical Notes and References
Problems
Multichannel and Multicarrier Systems
12-1 Muitichannel Digital Communication in AWGN Channels
12-1-1 Binary Signals
12-1-2. M-ary Orthogonal Signals
12-2. Multicarrier Communications
12-2-| Capacity of a Non-Ideal Linear Filter Channe}
2 An FFT-Based Multicarrier System
12-3 Bibiliographical Notes and References
Problems
Spread Spectrum Signals for Digital Communications
13-1 Model of Spread Spectrum Digital Communication System
13-2 Direct Sequence Spread Spectrum Signals
33-2-1 Error Rate Performance of the Decoder
13-2-2 Some Applications of DS Spread Spectrum Signals
{3-2-3 Effect of Pulsed Interference on DS Spread Spectrum
Systems
13-2-4 Generation of PN Sequences
13-3 Frequency-Hoppped Spread Spectrum Signals
13-3-1 Performance of FH Spread Spectrum Signals in AWGN
Channet
13-3-2 Performance of FH Spread Spectrum Signals in Partial-
Band Interference
13-3-3. A CDMA System Based on FH Spread Spectrum Signals
13-4 Other Types of Spread Spectrum Signals
13-5 Synchronization of Spread Spectrum Signals
13-6 Bibliographical Notes and References
Probiems
Digital Communication through Fading
Multipath Channels
14-1 Characterization of Fading Multipath Channels
14-1-1 Channel! Correlation Functions and Power Spectra
14-1-2 Statistical Models for Fading Channels
673
675
676
758
759
762
16714-2
14-3
14-4
14-6
14-7
The Effect of Characteristics on the Choice
of a Channel Model
Frequency-Nonselective, Slowly Fading Channel
Diversity Techniques for Fading Multipath Channels
14-41 Binary Signals
14-42 Multiphase Signals
14-4.3 M-ary Orthogonal Signals
Digital Signaling over a Frequency-Selective, Slowly Fading
Channel
14-51 A Tapped-Delay-Line Channel Model
14-5-2. The RAKE Demodulator
14-5-3 Performance of RAKE Receiver
Coded Waveforms for Fading Channels
14-6-1 Probability of Error for Soft-Decision Decoding of Linear
Binary Block Codes
14-6-2 Probability of Error for Hard-Decision Decoding of
Linear Binary Block Codes
14-6-3. Upper Bounds on the Performance of Convolutional
Codes for a Raleigh Fading Channel
14-6-4 Use of Constant-Weight Codes and Concatenated Codes
for a Fading Channel
14-6-5 System Design Based on the Cutoff Rate
14-6-6 Trellis-Coded Modulation
Bibliographical Notes and References
Problems
15 Multiuser Communications
15-1
15-2
15-3
15-4
15-5
Introduction to Multiple Access Techniques
Capacity of Multiple Access Methods
Code-Division Multiple Access
15-3-1_ CDMA Signal and Channel Models
15-3-2. The Optimum Receiver
15-3.3 Suboptimum Detectors
15-34 Performance Characteristics of Detectors
Random Access Methods
15-41 ALOHA System and Protocols
15-4-2 Carrier Sense Systems and Protocols
Bibliographical Notes and References
Problems
Appendix A The Levinson—Durbin Algorithm
Appendix B_ Error Probability for Multichannel
Binary Signals
770
772
7
778
787
811
gil
814
832
833
849
849
851
859
862
867
872
873
879Appendix C Error Probabilities for Adaptive Reception
of M-phase Signals
C-1 Mathematical Model for an M-phase Signalin;
Communications System .
C-2 Characteristic Function and Probability Density
Function of the Phase 6
C-3 Error Probabilities for Slowly Rayleigh Fading
Channels
C-4 Error Probabilities for Time-Invariant and Ricean
Fading Channels
Appendix D Square-Root Factorization
References and Bibliography
Index
@&
g
eg
9171
INTRODUCTION
In this book, we present the basic principles that underlie the analysis and
design of digital communication systems. The subject of digital communica-
tions involves the transmission of information in digital form from a source
that generates the information to one or more destinations. Of particular
importance in the analysis and design of communication systems are the
characteristics of the physical channels through which the information is
transmitted. The characteristics of the channel generally affect the design of
the basic building blocks of the communication system. Below, we describe the
elements of a communication system and their functions.
1-1 ELEMENTS OF A DIGITAL COMMUNICATION
SYSTEM
Figure 1-1-1 illustrates the functional diagram and the basic elements of a
digital communication system. The source output may be either an analog
signal, such as audio or video signal, or a digital signal, such as the output of a
teletype machine, that is discrete in time and has a finite number of output
characters. In a digital communication system, the messages produced by the
source are converted into a sequence of binary digits. Ideally, we should like to
represent the source output (message) by as few binary digits as possible. In
other words, we seek an efficient representation of the source output that
results in little or no redundancy. The process of efficiently converting the
output of either an analog or digital source into a sequence of binary digits is
called source encoding or daia compression.
The sequence of binary digits from the source encoder, which we call the
12 DIGITAL COMMUNICATIONS
Information
source and
input transducer}
Output
signal
FIGURE 1-1-1
Basic elements of a digital communication system.
information sequence, is passed to the channel encoder. The purpose of the
channel encoder is to introduce, in a controlled manner, some redundancy in
the binary information sequence that can be used at the receiver to overcome
the effects of noise and interference encountered in the transmission of the
signal through the channel. Thus, the added redundancy serves to increase the
reliability of the received data and improves the fidelity of the received signal.
In effect, redundancy in the information sequence aids the receiver in decoding
the desired information sequence. For example, a (trivial) form of encoding of
the binary information sequence is simply to repeat each binary digit m times,
where m is some positive integer. More sophisticated (nontrivial) encoding
involves taking k information bits at a time and mapping each k-bit sequence
into a unique n-bit sequence, called a code word. The amount of redundancy
introduced by encoding the data in this manner is measured by the ratio n/k.
The reciprocal of this ratio, namely k/n, is called the rate of the code or,
simply, the code rate.
The binary sequence at the output of the channel encoder is passed to the
digital modulator, which serves as the interface to the communications channel.
Since nearly all of the communication channels encountered in practice are
capable of transmitting electrical signals (waveforms), the primary purpose of
the digital modulator is to map the binary information sequence into signal
waveforms. To elaborate on this point, let us suppose that the coded
information sequence is to be transmitted one bit at a time at some uniform
rate R bits/s. The digital modulator may simply map the binary digit 0 into a
waveform S(t} and the binary digit 1 into a waveform s,(r). In this manner,
each bit from the channel! encoder is transmitted separately. We call this binary
modulation. Alternatively, the modulator may transmit b coded information
bits at a time by using M = 2° distinct waveforms s,(t), i=0,1,...,M—1, one
waveform for each of the 2° possible b-bit sequences. We call this M-ary
modulation (M> 2). Note that a new b-bit sequence enters the modulatorCHAPTER |: INTRODUCTION 3.
every b/R seconds. Hence, when the channel bit rate R is fixed, the amount of
time available to transmit one of the M waveforms corresponding to a b-bit
sequence is b times the time period in a system that uses binary modulation.
The communication channel is the physical medium that is used to send the
signal from the transmitter to the receiver. In wireless transmission, the
channel may be the atmosphere (free space). On the other hand, telephone
channels usually employ a variety of physical media, including wire lines,
optical fiber cables, and wireless {microwave radio). Whatever the physical
medium used for transmission of the information, the essential feature is that
the transmitted signal is corrupted in a random manner by a variety of possible
mechanisms, such as additive thermal noise generated by electronic devices,
man-made noise, ¢.g., automobile ignition noise, and atmospheric noise, e.g.,
electrical lightning discharges during thunderstorms.
At the receiving end of a digital communications system, the digital
demodulator processes the channel-corrupted transmitted waveform and re-
duces the waveforms to a sequence of numbers that represent estimates of the
transmitted data symbols (binary or M-ary). This sequence of numbers is
passed to the channel decader, which attempts to reconstruct the original
information sequence from knowledge of the code used by the channel
encoder and the redundancy contained in the received data.
A measure of how well the demodulator and decoder perform is the
frequency with which errors occur in the decoded sequence. More precisely.
the average probability of a bit-error at the output of the decoder is a measure
of the performance of the demodulator-decoder combination. In general, the
probability of error is a function of the code characteristics, the types of
waveforms used to transmit the information over the channel, the transmitter
power, the characteristics of the channel, i.e., the amount of noise, the nature
of the interference, etc., and the method of demodulation and decoding. These
items and their effect on performance will be discussed in detail in subsequent
chapters.
As a final step, when an analog output is desired, the source decoder accepts
the output sequence from the channel decoder and, from knowledge of the
source encoding method used, attempts to reconstruct the original signal from
the source. Due to channel decoding errors and possible distortion introduced
by ihe source encoder and, perhaps, the source decoder, the signal at the
output of the source decoder is an approximation to the original source output.
The difference or some function of the difference between the original signal
and the reconstructed signal is a measure of the distortion introduced by the
digital communication system.
1-2 COMMUNICATION CHANNELS AND THEIR
CHARACTERISTICS
As indicated in the preceding discussion, the communication channel provides
the connection between the transmitter and the receiver. The physical channel4 = picrtal. COMMUNICATIONS
may be a pair of wires that carry the electrical signal, or an optical fiber that
carries the information on a modulated light beam, or an underwater ocean
channel in which the information is transmitted acoustically, or free space over
which the information-bearing signal is radiated by use of an antenna. Other
media that can be characterized as communication channels are data storage
media, such as magnetic tape, magnetic disks, and optical disks.
One commen problem in signal transmission through any channel is additive
noise. In general, additive noise is generated internally by components such as
resistors and solid-state devices used to implement the communication system.
This is sometimes called thermal noise. Other sources of noise and interference
may arise externally to the system, such as interference from other users of the
channel. When such noise and interference occupy the same frequency band as
the desired signal, its effect can be minimized by proper design of the
transmitted signal and its demodulator at the receiver. Other types of signal
degradations that may be encountered in transmission over the channel are
signal attenuation, amplitude and phase distortion, and multipath distortion.
The effects of noise may be minimized by increasing the power in the
transmitted signal. However, equipment and other practical constraints limit
the power level in the transmitted signal. Another basic limitation is the
available channel bandwidth. A bandwidth constraint is usually due to the
physical limitations of the medium and the electronic components used to
implement the transmitter and the receiver. These two limitations result in
constraining the amount of data that can be transmitted reliably over any
communications channel as we shail observe in later chapters. Below, we
describe some of the important characteristics of several communication
channels.
Wireline Channels The telephone network makes extensive use of wire
lines for voice signal transmission, as weli as data and video transmission.
Twisted-pair wire lines and coaxial cable are basically guided electromagnetic
channels that provide relatively modest bandwidths. Telephone wire generally
used to connect a customer to a central office has a bandwidth of several
hundred kilohertz (kHz). On the other hand, coaxial cable has a usable
bandwidth of several megahertz (MHz). Figure 1-2-1 illustrates the frequency
range of guided electromagnetic channels, which include waveguides and
optical fibers.
Signals transmitted through such channels are distored in both amplitude
and phase and further corrupted by additive noise. Twisted-pair wireline
channels are also prone to crosstalk interference from physically adjacent
channels. Because wireline channels carry a large percentage of our daily
communications around the country and the world, much research has been
performed on the characterization of their transmission properties and on
methods for mitigating the amplitude and phase distortion encountered in
signal transmission. In Chapter 9, we describe methods for designing optimum
transmitted signals and their demodulation; in Chapters 10 and 11, weFIGURE 1-21
CHAPTER 1: INTRODUCTION §
[ete | 108 Ha
[Vinible tight
10%m
10! He
100 ma
100 GHz
10 GHz
Oem
{Giz
100 MHz
Frequency
Wavelength
om
10 MHz
100m
tkm
100 kHz
10km
JOkKHz
100 km
kHz
Frequency range for guided wire channel.
consider the design of channel equalizers that compensate for amplitude and
phase distortion on these channels.
Fiber Optic Channels Optical fibers offer the communications system
designer a channel bandwidth that is several orders of magnitude larger than
coaxial cable channels. During the past decade, optical fiber cables have been
developed that have a relatively tow signal attenuation, and highly reliable
photonic devices have been developed for signal generation and signal
detection. These technofogical advances have resulted in a rapid deployment of
optical fiber channels, both in domestic telecommunication systems as well as
for trans-Atlantic and trans-Pacific communications. With the large bandwidth6 DIGITAL COMMUNICATIONS
available on fiber optic channels, it is possible for telephone companies to offer
subscribers a wide array of telecommunication services, including voice, data,
facsimile, and video.
The transmitter or modulator in a fiber optic communication system is a
light source, either a light-emitting diode (LED) or a laser. Information is
transmitted by varying (modulating) the intensity of the light source with the
message signal. The light propagates through the fiber as a light wave and is
amplified periodically (in the case of digital transmission, it is detected and
regenerated by repeaters) along the transmission path to compensate for signal
attenuation. At the receiver, the light intensity is detected by a photodiode,
whose output is an electrical signal that varies in direct proportion to the
power of the light impinging on the photodiode. Soutces of noise in fiber optic
channels are photodiodes and electronic amplifiers.
It is envisioned that optical fiber channels will replace nearly all wireline
channels in the telephone network by the turn of the century.
Wireless Electromagnetic Channels In wireless communication systems,
electromagnetic energy is coupled to the propagation medium by an antenna
which serves as the radiator. The physical size and the configuration of the
antenna depend primarily on the frequency of operation. To obtain efficient
radiation of electromagnetic energy, the antenna must be longer than 7 of the
wavelength. Consequently, a radio station transmitting in the AM frequency
band, say at f.= 1 MHz (corresponding to a wavelength of A =c/f. = 300m),
requires an antenna of at least 30m. Other important characteristics and
attributes of antennas for wireless transmission are described in Chapter 5.
Figure 1-2-2 illustrates the various frequency bands of the electromagnetic
spectrum. The mode of propagation of electromagnetic waves in the atmo-
sphere and in free space may be subdivided into three categories, namely,
ground-wave propagation, sky-wave propagation, and line-of-sight (LOS)
Propagation. In the VLF and audio frequency bands, where the wavelengths
exceed 10km, the earth and the ionosphere act as a waveguide for electromag-
netic wave propagation. In these frequency ranges, communication signals
Practically propagate around the globe. For this reason, these frequency bands
are primarily used to provide navigational aids from shore to ships around the
world.. The channel bandwidths available in these frequency bands are
relatively small (usually 1-10% of the center frequency), and hence the
information that is transmitted through these channels is of relatively slow
speed and generally confined to digital transmission. A dominant type of noise
at these frequencies is generated from thunderstorm activity around the globe,
especially in tropical regions. Interference results from the many users of these
frequency bands.
Ground-waye propagation, as illustrated in Fig. 1-2-3, is the dominant mode
of propagation for frequencies in the MF band (0.3-3MHz). This is the
frequency band used for AM broadcasting and maritime radio broadcasting. In
AM broadcasting, the range with groundwave propagation of even the more1ocm
‘Ultra high frequency
Wavelength
10km
100 km
FIGURE 1-2-2
FIGURE 1-23
Tem Satellite to satellite
“tase mene
(SHF) Earth satellite
Radar
Very high frequency VHF TV and FM Broadcast 100 Miz ‘adio
(WHE) Mobile radio.
10m
CHAPTER 1: INTRODUCTION
Frequency band Use
Unraviolet
10'S Hz
10 Hz
Millimeter waves 100 GHz
(EHF) Experimental
Navigation
Mobile radio
1GHz
ul
(HP) UHF TV and mobile radio
Mobile, aeronautical
Shortwave
; Business
High frequency Amateur radio 1OMHz
Frequency
ar) International radio
100m - Citizen's band
Medium frequency
me
tkm
Longwave
radio
100 kHz
10 kHz
1 kHz
Frequency range for wireless electromagnetic channels. (Adapted from Carlson (1975),
2nd edition, © McGraw-Hill Book Company Co. Reprinted with permission of the publisher.}
Hlustration of ground-wave propagation.
7FIGURE 1-2-4
8 DIGITAL COMMUNICATIONS
Allustration of sky-wave propagation.
powerful radio stations is limited to about 150km. Atmospheric noise,
man-made noise, and thermal noise from electronic components at the receiver
are dominant disturbances for signal transmission in the MF band.
Sky-wave propagation, as illustrated in Fig. 1-2-4 results from transmitted
signals being reflected (bent or refracted) from the ionosphere, which consists
of several layers of charged particles ranging in altitude from 50 to 400km
above the surface of the earth. During the daytime hours, the heating of the
lower atmosphere by the sun causes the formation of the lower layers at
altitudes below 120 km. These lower layers, especially the D-layer, serve to
absorb frequencies below 2 MHz, thus severely limiting sky-wave propagation
of AM radio broadcast. However, during the night-time hours, the electron
density in the lower layers of the ionosphere drops sharply and the frequency
absorption that occurs during the daytime is significantly reduced. As a
consequence, powerful AM radio broadcast stations can propagate over large
distances via sky wave over the F-layer of the ionosphere, which ranges from
140 to 400km above the surface of the earth.
A frequently occurring problem with electromagnetic wave propagation via
sky wave in the HF frequency range is signal multipath. Signal multipath occurs
when the transmitted signal arrives at the receiver via multiple propagation
paths at different delays. It generally results in intersymbol interference in a
digital communication system. Moreover, the signal components arriving via
different propagation paths may add destructively, resulting in a phenomenon
called signal fading, which most people have experienced when listening to a
distant radio station at night when sky wave is the dominant propagation
mode. Additive noise at HF is a combination of atmospheric noise and thermal
norse.
Sky-wave ionospheric propagation ceases to exist at frequencies above
approximately 30 MHz, which is the end of the HF band. However, it is
possible to have ionospheric scatter propagation at frequencies in the range
30-60 MHz, resulting from signal scattering from the lower ionosphere. It is
also possible to communicate over distances of several hundred miles by use of
tropospheric scattering at frequencies in the range 40-300 MHz. Troposcatter
results from signal scattering due to particles in the atmosphere at altitudes of
10 miles or less. Generally, ionospheric scatter and tropospheric scatterCHAPTER |: INTRODUCTION 9
involve large signal propagation losses and require a large amount of
transmitter power and relatively large antennas.
Frequencies above 30 MHz propagate through the ionosphere with rela-
tively little loss and make satcllite and extraterrestrial communications
possible. Hence, at frequencies in the WHF band and higher, the dominant
mode of electromagnetic propagation is line-of-sight (LOS) propagation. For
terrestrial communication systems, this means that the transmitter and receiver
antennas must be in direct LOS with relatively little or no obstruction. For this
reason, television stations transmitting in the WHF and UHF frequency bands
mount their antennas on high towers to achieve a broad coverage area.
In general, the coverage area for LOS propagation is limited by the
curvature of the earth. If the transmitting antenna is mounted at a height hm
above the surface of the earth, the distance to the radio horizon, assuming no
physical obstructions such as mountains, is approximately d = V15h km. For
example, a TV antenna mounted on a tower of 300m in height provides a
coverage of approximately 67 km. As another example, microwave radio relay
systems used extensively for telephone and video transmission at frequencies
above 1 GHz have antennas mounted on tall towers or on the top of tall
buildings.
The dominant noise limiting the performance of a communication system in
VHF and UHF frequency ranges is thermal noise generated in the receiver
front end and cosmic noise picked up by the antenna. At frequencies in the
SHF band above 10 GHz, atmospheric conditions piay a major role in signal
propagation. For example, at 10GHz, the attenuation ranges from about
0.003 dB/km in light rain to about 0.3 dB/km in heavy rain. At 100 GHz, the
attenuation ranges from about 0.1 dB/km in light rain to about 6dB/km in
heavy rain. Hence, in this frequency range, heavy rain introduces extremely
high propagation losses that can result in service outages (total breakdown in
the communication system).
At frequencies above the EHF (extremely high frequency) band, we have
the infrared and visible light regions of the electromagnetic spectrum, which
can be used to provide LOS optical communication in free space. To date,
these frequency bands have been used in experimental communication
systems, such as satellite-to-satellite links.
Underwater Acoustic Channels Over the past few decades. ocean ex-
ploration activity has been steadily increasing. Coupled with this increase is the
need to transmit data, collected by sensors placed under water, to the surface
of the ocean. From there, it is possible to relay the data via a satellite to a data
collection center.
Electromagnetic waves do not propagate over long distances under water
except at extremely low frequencies. However, the transmission of signals at
such low frequencies is prohibitively expensive because of the large and
powerful transmitters required. The attenuation of electromagnetic waves in
water can be expressed in terms of the skin depth, which is the distance a signal
is attenuated by 1/e. For sea water, the skin depth 6 =250/Vf, where f is10 picitrat. COMMUNICATIONS,
expressed in Hz and 4 is in m. For example, at 10 kHz. the skin depth is 2.5m.
In contrast. acoustic signals propagate over distances of tens and even
hundreds of kilometers.
An underwater acoustic channel is characterized as a multipath channel due
to signal reflections from the surface and the bottom of the sea. Because of
wave motion, the signal multipath components undergo time-varying propaga-
tion delays that result in signal fading. In addition, there is frequency-
dependent attenuation, which is approximately proportional to the square of
the signal frequciacy. The sound velocity is nominally about 1500-m/s, but the
actual value will vary either above or below the nominal value depending on
the depth at which the signal propagates.
Ambient ocean acoustic noise is caused by shrimp, fish, and various
mammals. Near harbors, there is also man-made acoustic noise in addition to
the ambient noise. In spite of this hostile environment, it is possible to design
and implement efficient and highly reliable underwater acoustic communica-
tion systems for transmitting digital signals over large distances.
Storage Channels Information storage and retrieval systems constitute a
very significant part of data-handling activities on a daily basis. Magnetic tape,
including digital audio tape and video tape, magnetic disks used for storing
large amounts of computer data, optical disks used for computer data storage,
and compact disks are examples of data storage systems that can be
characterized as communication channels. The process of storing data on a
magnetic tape or a magnetic or optical disk is equivalent to transmitting a
signal over a telephone or a radio channel. The readback process and the
signal processing involved in storage systems to recover the stored information
are equivalent to the functions performed by a receiver in a telephone or radio
communication system to recover the transmitted information.
Additive noise generated by the electronic components and interference
from adjacent tracks is generally present in the readback signal of a storage
system, just as is the case in a telephone or a radio communication system.
The amount of data that can be stored is generally limited by the size of the
disk or tape and the density (number of bits stored per square inch) that can be
achieved by fhe write/read electronic systems and heads. For example, a
packing density of 10” bits per square inch has been recently demonstrated in
an experimental magnetic disk storage system. (Current commercial magnetic
storage products achieve a much lower density.) The speed at which data can
be written on a disk or tape and the speed at which it can be read back are also
limited by the associated mechanical and electrical subsystems that constitute
an information storage system.
Channel coding and modulation are essential components of a well-designed
digital magnetic or optical storage system. In the readback process, the signal is
demodulated and the added redundancy introduced by the channel encoder is
used to correct errors in the readback signal.CHAPTER I: INTRODUCTION IE
1-3, MATHEMATICAL MODELS FOR
COMMUNICATION CHANNELS
FIGURE 1-3-1
In the design of communication systems for transmitting information through
physical channels, we find it convenient to construct mathematical models that
reflect the most important characteristics of the transmission medium. Then,
the mathematical model for the channel is used in the design of the channel
encoder and modulator at the transmitter and the demodulator and channel
decoder at the receiver. Below, we provide a brief description of the
channel models that are frequently used to characterize many of the physical
channels that we encounter in practice.
The Additive Noise Channel The simplest mathematical model for a
communication channel is the additive noise channel, illustrated in Fig. 1-3-1.
In this model, the transmitted signal s(s) is corrupted by an additive random
noise process n(t). Physically, the additive noise process may arise from
electronic components and amplifiers at the receiver of the communication
system, or from interference encountered in transmission (as in the case of
radio signal transmission).
If the noise is introduced primarily by electronic components and amplifiers
at the receiver, it may be characterized as thermal noise. This type of noise is
characterized statistically as a gaussian noise process. Hence, the resulting
mathematical model for the channel is usually called the additive gaussian
noise channel, Because this channel model applies to a broad class of physical
communication channels and because of its mathematical tractability, this is
the predominant channe} mode! used in our communication system analysis
and design. Channel attenuation is easily incorporated into the model. When
the signal undergoes attenuation in transmission through the channel, the
teceived signal is
r(t) = as(t) + n(t) (1-3-1),
where a is the attenuation factor.
The Linear Filter Channel In some physical channels, such as wireline
telephone channels, filters are used to ensure that the transmitted signals do
not exceed specified bandwidth limitations and thus do not interfere with one
another. Such channels are generally characterized mathematically as linear
filter channels with additive noise, as illustrated in Fig. 1-3-2. Hence, if the
rit) = sr + mit)
The additive noise channel.FIGURE 1-3-2
FIGURE 1-33
12 DIGITAL COMMUNICATIONS
rity = st) wet) ene)
‘The linear fiher channet with
additive noise.
channel input is the signal s(t), the channel output is the signal
r(t) = s(.) & c(t) + a(t)
=[{ c(nse—n deta (1-322)
where c(t) is the impulse response of the linear filter and * denotes
convolution.
The Linear Time-Variant Filter Channel Physical channels such as under-
water acoustic channels and ionospheric radio channels that result in time-
variant multipath propagation of the transmitted signal may be characterized
mathematically as time-variant linear filters. Such linear filters are charac-
terized by a time-variant channel impulse response c(t; 1), where c(z;7) is the
response of the channel at time f due to an impulse applied at time t — t. Thus,
T represents the “age” (elapsed-time) variable. The linear time-variant filter
channel with additive noise is illustrated in Fig. 1-3-3. For an input signal s(r),
the channel output signal is
rt) = (1) w(t.) + A)
= f e(ti thst - t)dt+n(e} (1-3-3)
A good model for multipath signal propagation through physical channels,
such as the ionosphere (at frequencies below 30 MHz) and mobile cellular
radio channels, is a special case of (1-3-3) in which the time-variant impulse
response has the form
i
c(t) = > ag()8(t - %) (1-3-4)
eel
nn
Linear time-variant filter channel with additive noise.CHAPTER |: INTRODUCTION 13
where the {a,(t)} represents the possibly time-variant attenuation factor for the
L multipath propagation paths and {1,} are the corresponding time delays. If
(1-3-4) is substituted into (1-3-3), the received signal has the form
L
r(t)= z ay (t)s(t — %]) +n(e) (1-3-5)
Hence, the received signal consists of L multipath components, where each
component is attenuated by {a,(¢)} and delayed by {t,}.
The three mathematical models described above adequately characterize the
great majority of the physical channels encountered in practice. These three
channel models are used in this text for the analysis and design of communica-
tion systems.
1-4 A HISTORICAL PERSPECTIVE IN THE
DEVELOPMENT OF DIGIFAL COMMUNICATIONS
Tt is remarkable that the earliest form of electrical communication, namely
telegraphy, was a digital communication system. The electric telegraph was
developed by Samuel Morse and was demonstrated in 1837. Morse devised the
variable-length binary code in which letters of the English alphabet are
represented by a sequence of dots and dashes (code words). In this code, more
frequently occurring letters are represented by short code words, while letters
occurring less frequently are represented by longer code words. Thus, the
Morse code‘was the precursor of the variable-length source coding methods
described in Chapter 3.
Nearly 40 years later, in 1875, Emile Baudot devised a code for telegraphy
in which every letter was encoded into fixed-length binary code words of length
5. In the Baudot code, binary code elements are of equal length and designated
as mark and space.
Although Morse is responsible for the development of the first electrical
digital communication system (telegraphy), the beginnings of what we now
regard as modem digital communications stem from the work of Nyquist
(1924), who investigated the problem of determining the maximum signaling
rate that can be used over a telegraph channel of a given bandwidth without
intersymbol interference. He formulated a model of a telegraph system in
which a transmitted signal has the general form
s(t) =D a,g(- nT) (1-4-1)
where’ g(t) represents a basic pulse shape and {a,} is the binary data sequence
of {+1} transmitted at a rate of 1/T bits/s. Nyquist set out to determine the
optimum pulse shape that was bandlimited to W Hz and maximized the bit rate
under the constraint that the pulse caused no intersymbol interference at the14 picitaL COMMUNICATIONS
sampling time k/7. k =0, +1, £2,.... His studies led him to conclude that the
maximum pulse rate is 2W pulses/s, This rate is now called the Nyquist rate.
Moreover, this pulse rate can be achieved by using the pulses g(t) =
{sin 24W1)/2xWi. This pulse shape allows recovery of the data without
intersymbol interference at the sampling instants. Nyquist’s result is equivalent
to a version of the sampling theorem for bandlimited signals, which was later
stated precisely by Shannon (1948). The sampling theorem states that a signal
of bandwidth W can be reconstructed irom samples taken at the Nyquist rate
of 2W sampies/s using the interpolation formula
_ At) sin (22W(¢ ~ nv /2W))
= S559) 2nW(t—n/2W)
In light of Nyquist's work, Hartley (1928) considered the issue of the
amount of data that can be transmitted reliably over a bandlimited channel
when multiple amplitude levels are used. Due to the presence of noise and
other interference, Hartley postulated that the receiver can reliably estimate
the received signal amplitude to some accuracy, say Az. This investigation led
Hartley to conclude that there is a maximum data rate that can be
communicated reliably over 2 bandlimited channe] when the maximum signal
amplitude is limited to Aj, (fixed power constraint) and the amplitude
resolution is A5.
Another significant advance in the development of communications was the
work of Wiener (1942), who considered the problem of estimating a desired
signal waveform s(f) in the presence of additive noise n(t), based on
observation of the received signal r(¢)=s(r)+n(t). This problem arises in
signal demodulation. Wiener determined the linear filter whose output is the
best mean-square approximation to the desired signal s(). The resulting filter
is called the optimum tinear (Wiener) filter.
Hartley's and Nyquist’s results on the maximum transmission rate of digital
information were precursors to the work of Shannon (1948a,b), who establ-
ished the mathematical foundations for information transmission and derived
the fundamental limits for digital communication systems. In his pioneering
work, Shannon formulated the basic problem of reliable transmission of
information in statistical terms, using probabilistic models for information
sources and communication channels. Based on such a statistical formulation,
he adopted a logarithmic measure for the information content of a source. He
also demonstrated that the effect of a transmitter power constraint, a
bandwidth constraint, and additive noise can be associated with the channel
and incorporated into a single parameter, called the channel capacity, For
example, in the case of an additive white (spectrally fiat) gaussian noise
interference, an ideal bandlimited channel of bandwidth W has a capacity C
given by
(1-4-2)
C= Whog, {t + wa) bits/s (1-4-3)CHariER b NtRopL CON 1S
where P is the average transmitted power and N, is the power spectral density
of the additive noise. The significance of the channel capacity is as follows: If
the information rate R from the source is less than C (R
C,_ reliable
transmission is not possible regardless of the amount of signal processing
performed at the transmitter and receiver. Thus, Shannon established basic
limits on communication of information, and gave birth to a new field that is
now called information theory.
Another important contribution to the field of digitat communication is the
“work of Kotelnikov (1947), who provided a coherent analysis of the various
digital communication systems based on a geometrical approach. Kotelnikov's
approach was later expanded by Wozencraft and Jacobs (1965).
Following Shannon's publications, came the classic work of Hamming
(1950) on error-detecting and error-correcting codes to combat the detrimental
effects of channel noise. Hamming’s work stimulated many researchers in the
years that followed, and a variety of new and powerful codes were discovered,
many of which are used today in the implementation of modern communica-
tion systems.
The increase in demand for data transmission during the last three to four
decades, coupled with the development of more sophisticated integrated
circuits, has led to the development of very efficient and more reliable digital
communication systems. In the course of these developments, Shannon's
original results and the generalization of his results on maximum transmission
limits over a channel and on bounds on the performance achieved have served
as benchmarks for any given communication system design. The theoretical
limits derived by Shannon and other researchers that contributed to the
development of information theory serve as an ultimate goal in the continuing
efforts to design and develop more efficient digital communication systems.
There have been many new advances in the area of digital communications
following the early work of Shannon, Kotelnikov, and Hamming. Some of the
most notable developments are the following:
* The development of new block codes by Muller (1954), Reed (1954),
Reed and Solomon (1960), Bose and Ray-Chaudhuri (1960a,b), and Goppa
(1970, 1971).
* The development of concatenated codes by Forney (1966).
* The development of computationally efficient decoding of BCH codes,
e.g., the Berlekamp—Massey algorithm (see Chien, 1964; Berlekamp, 1968).
* The development of convolutional codes and decoding algorithms by
Wozencraft and Reiffen (1961), Fano (1963), Zigangirov (1966), Jelinek
{1969), Forney (1970, 1972), and Viterbi (1967, 1971).
* The development of trellis-coded modulation by Ungerboeck (982),
Forney et al. (1984), Wei (1987), and others.
* The development of efficient source encodings algorithms for data16 pictTaL CoMMUnICATIONS
compression, such as those devised by Ziv and Lempel (1977, 1978) and Linde
et al. (1980).
1-5 OVERVIEW OF THE BOOK
Chapter 2 presents a brief review of the basic notions in the theory of
probability and random processes. Our primary objectives in this chapter are
to present results that are used throughout the book and to establish some
necessary notation.
In Chapter 3, we provide an introduction to source coding for discrete and
analog sources, Included in this chapter are the Huffman coding algorithm and
the Lempel-Ziv algorithm for discrete sources, and scalar and vector quantiza-
tion techniques for analog sources.
Chapter 4 treats the characterization of communicatign signals and systems
from a mathematical viewpoint. Included in this chapter is a geometric
representation of signal waveforms used for digital communications.
Chapters 5-8 are focused on modulation/demodulation and channel
coding/decoding for the additive, white gaussian noise channel. The emphasis
is on optimum demodulation and decoding techniques and their performance.
The design of efficient modulators and demodulators for linear filter
channels with distortion is treated in Chapters 9-11. The focus is on signal
design and on channel equalization methods to compensate for the channel
distortion.
The final four chapters treat several more specialized topics. Chapter 12
treats multichannel and multicarrier communication systems. Chapter 13 is
focused on spread spectrum signals for digital communications and their
performance characteristics. Chapter 14 provides a in-depth treatment of
communication through fading multipath channels. Included in this treatment
is a description of channel characterization, signal design and demodulation
techniques and their performance, and coding/decoding techniques and their
performance. The last chapter of the book is focused on multiuser communica-
tion systems and multiple access methods.
1-6 BIBLIOGRAPHICAL NOTES AND REFERENCES
There are several historical treatments regarding the development of radio and
telecommunications during the past century. These may be found in the books
by McMahon (1984), Millman (1984), and Ryder and Fink (1984}. We have
already cited the classical works of Nyquist (1924), Hartley (1928), KoteInikov
(1947), Shannon (1948), and Hamming (1950), as well as some of the more
important advances that have occurred in the field since 1950. The collected
papers by Shannon have been published by IEEE Press in a book edited by
Sloane and Wyner (1993). Other collected works published by the IEEE Press
that might be of interest to the reader are Key Papers in the Development of
Coding Theory, edited by Berlekamp (1974), and Key Papers in the
Development of Information Theory, edited by Slepian (1974).2
PROBABILITY AND
STOCHASTIC
PROCESSES
The theory of probability and stochastic processes is an essential mathematical
tool in the design of digital communication systems. This subject is important
in the statistical modeling of sources that generate the information, in the
digitization of the source output, in the characterization of the channel through
which the digital information is transmitted, in the design of the receiver that
processes the information-bearing signal from the channel, and in the
evaluation of the performance of the communication system. Our coverage of
this rich and interesting subject is brief and limited in scope. We present a
number of definitions and basic concepts in the theory of probability and
stochastic processes and we derive several results that are important in the
design of efficient digital! communication systems and in the evaluation of their
performance.
We anticipate that most readers have had some prior exposure to the theory
of probability and stochastic processes, so that our treatment serves primarily
as a review. Some readers, however, who have had no previous exposure may
find the presentation in this chapter extremely brief. These readers will benefit
from additional reading of engineering-level treatments of the subject found in
the texts by Davenport and Root (1958), Davenport (1970), Papoulis (1984),
Helstrom (1991), and Leon-Garcia (1994).
2-1 PROBABILITY
Let us consider an experiment, such as the rolling of a die, with a number of
possible outcomes. The sample space S of the experiment consists of the set of
all possible outcomes. In the case of the die,
S={l, 2, 3, 4, 5, 6} (2-1-1)
1718 picirat communications
where the integers 1,...,6 represent the number of dots on the six faces of the
die. These six possible outcomes are the sample points of the experiment. An
event is a subset of §, and may consist of any number of sample points. For
example, the event A defined as
A= {2,4} (2-1-2)
consists of the outcomes 2 and 4. The compiement of the event A, denoted by
A, consists of all the sample points in § that are not in A and, hence,
A={1,3, 5, 6} (2-1-3)
Two events are said to be mutally exclusive if they have no sample points in
common—that is, if the occurrence of one event excludes the occurrence of the
other. . or example, if A is defined as in (2-1-2) and the event B is defined as
B={1,3, 6} (2-1-4)
then A and B are mutually exclusive events. Similarly, A and A are mutually
exclusive events.
The union (sum) of two events is an event that consists of all the sample
points in the two events. For example, if B is the event defined in (2-1-4) and C
is the event defined as
C={i,2, 3} (2-1-5)
then, the union of B and C, denoted by B UC, is the event
D=BUC
={1, 2,3, 6} (2-1-6)
Similarly, AU.A =S, where S$ is the entire sample space or the certain event.
On the other hand, the intersection of two events is an event that consists of
the points that are. common to the two events. Thus, if E = B MC represents
the intersection of the events B and C, defined by (2-1-4) and (2-1-5),
respectively, then
E={1, 3}
When the events are mutually exclusive, the intersection is the null event,
denoted as @. For example. ANB =, and ANA =. The definitions of
union and intersection are extended to more than two events in a straightfor-
ward manner.
Associated with each event A contained in S is its probability P(A). In the
assignment of probabilities to events, we adopt an axiomatic viewpoint. ThatCHAPTER 2: PROBABILITY AND STOCHASTiC PRocEsses 19
is, we postulate that the probability of the event A satisfies the condition
P(A) 2 0. We also postulate that the probability of the sample space (certain
event) is P(S)=1. The third postulate deals with the probability of mutually
exclusive events. Suppose that A;,, i= 1,2,..., are a (possibly infinite) number
of events in the sample space S such that
ANA, =O i¥j7=1,2,...
Then the probability of the union of these mutually exclusive events satisfies
the condition
PU A) = 2 P(A) (2-1-7)
For example, in a roll of a fair die, each possible outcome is assigned the
probability 3. The event A defined by (2-1-2) consists of two mutually exclusive
subevents or outcomes, and, hence, P(A) = %=4. Also, the probability of the
event AUB, where A and B are the mutually exclusive events defined by
(2-1-2) and (2-1-4), respectively, is P(A) + P(B)=4+4=%
Joint Events and Joint Probabilities Instead of dealing with a single
experiment, let us perform two experiments and consider their outcomes. For
example, the two experiments may be two separate tosses of a single die or a
single toss of two dice. In either case, the sample space S consists of the 36
two-tuples (i,j) where i, j= 1,2,...,6. If the dice are fair, each point in the
sample space is assigned the probability %. We may now consider joint events,
such as {i is even, j= 3}, and determine the associated probabilities of such
events from knowledge of the probabilities of the sampie points.
In general, if one experiment has the possible outcomes A,, i= 1,2,....7,
and the second experiment has the possible outcomes B,, j=1,2,...,m, then
the combined experiment has the possible joint outcomes (A;, B,), i=
1,2,...,,7=1,2,...,m. Associated with each joint outcome (A;, B,) is the
joint probability P(A,, B;) which satisfies the condition
0 P(A;, Bi) = P(A;) (2-1-8)
j=l
Similarly, if the outcomes A,, i=1,2,... >”, are mutually exclusive then
> P(A,, B,) = P(B)) (2-1-9)
i=l20) diGITAL COMMUNICATIONS
Furthermore, if all the outcomes of the two experiments are mutually exclusive
then
3
2
The generalization of the above treatment to more than two experiments is
straightforward.
P(A, B)=1 (2-1-10)
*
Conditional Probabilities Consider a combined experiment in which a
joint event occurs with probability P(A, B). Suppose that the event B has
occurred and we wish to determine the probability of occurrence of the event
A. This is calied the conditional probability of the event A given the occurrence
of the event B and is defined as
P(A, B)
Pla! B)= Fe)
(2-1-11)
provided P(B)>0. In a similar manner, the probability of the event B
conditioned on the occurrence of the event A is defined as
P(A, BY
P(B|A)= P(A)
(2-1-12)
provided P(A)>0. The relations in (2-I-11) and (2-1-12) may also be
expressed as
P(A, B)= P(A | B)P(B)= P(B | A)P(A) (2-1-13)
The relations in (2-1-11), (2-1-12), and (2-1-13) also apply to a single
experiment in which A and B are any two events defined on the sample space S
and P(A, B) is interpreted as the probability of the 4 8. That is, P(A, B)
denotes the simultaneous occurrence of A and B. For example, consider the
events B and C given by (2-1-4) and (2-1-5), respectively, for the single toss of
a die. The joint event consists of the sample points {1,3}. The conditional
probability of the event C given that B occurred is
=i
3
P(C|B)=
onion
In a single experiment, we observe that when two events A and B are
mutually exclusive, AM B = @ and, hence, P(A | B) =0. Also, if A is a subset
of B then AM B =A and, hence,
P
Pa|ay=FOCHAPTER 2: PROBABILITY AND STOCHASTIC PROCESSES 21
On the other hand, if B is a subset of A, we have AM B = B and, hence,
PB
Pia|B)= Fel
An extremely useful relationship for conditional probabilities is Bayes’
theorem, which states that if A,, i=1,2,...,#, are mutually exclusive events
such that
and B is an arbitrary event with nonzero probability then
P(A, B)
P(B)
_—P(B | A)P(A,) aa)
3 Pe lappa)
P(A, | B)=
We use this formula in Chapter 5 to derive the structure of the optimum
receiver for a digital communication system in which the events A,, i =
1,2,...,”, represent the possible transmitted messages in a given time
interval, P(A,) represent their a priori probabilities, 8 represents the received
signal, which consists of the transmitted message (one of the A,) corrupted by
noise, and P(A, | B) is the a posteriori probability of A, conditioned on having
observed the received signal B.
Statistical Independence The statistical independence of two or more
events is another important concept in probability theory. It usually arises
when we consider two or more experiments or repeated trials of a single
experiment. To explain this concept, we consider two events A and B and their
conditional probability P(A | B), which is the probability of occurrence of A
given that B has occurred. Suppose that the occurrence of A does not depend
on the occurrence of B. That is,
P(A | B) = P(A) (2-1-15)
Substitution of (2-1-15) into (2-1-13) yields the result
P(A, B) = P(A)P(B) (2-1-16)
That is, the joint probability of the events A and B factors into the product of220 iGtAL COMMUNICATIONS
the elementary or marginal probabilities P(A) and P(B). When the events A
and B satisfy the relation in (2-1-16), they are said to be statistically
independent.
For example, consider two successive experiments in tossing a die. Let A
represent the even-numbered sample points {2,4,6} in the first toss and B
represent the even-numbered possible outcomes {2, 4,6} in the second toss. In
a fair die, we assign the probabilities P(A) = $ and P(B)= 5. Now, the joint
probability of the joint event ‘even-numbered outcome on the first toss and
even-numbered outcome on the second toss” is just the probability of the nine
pairs of outcomes (i,j), i= 2, 4,6. 7 = 2, 4,6, which is 4. Also,
P(A, B) = P(A)P(B)= 4
Thus, the events A and B are statistically independent. Similarly, we may say
that the outcomes of the two experiments are statistically independent.
The definition of statistical independence can be extended to three or more
events. Three statistically independent events A,, Ar, and A, must satisfy the
following conditions:
P(A,, Ax) = P(A,)P(A2)
P(Ay, Ax) = P(A\)P(A3)
(2-1-17)
P(A, Ax) = P(Az)P(Aa)
P(A;, Az, As) = P(A,)P(A2)P(Aa)
In the general case, the events A,, i= 1,2,...,n, are statistically independent
provided that the probabilities of the joint events taken 2,3,4,..., and nata
time factor into the product of the probabilities of the individual events.
2-1-1 Random Variables, Probability Distributions, and
Probability Densities
Given an experiment having a sample space S and elements s « S$, we define a
function X(s) whose domain is S and whose range is a set of numbers on the
real line. The function X(s) is called a random variable. For example, if we flip
a coin the possible outcomes are head (H) and tail (T), so § eontains two
points labeled H and T. Suppose we define a function X(s) such that
1 (s=H)
“wn (21-18)
x)=
Thus we have mapped the two possible outcomes of the coin-flippingFIGURE 2-1-1
CHAPTER 2 PROBABILITY AND STOCHASTIC PROCESSES 23
experiment into the two points (+1) on the real line. Another experiment is
the toss of a die with possible outcomes 5 = {1, 2, 3, 4, 5, 6}. A random variable
defined on this sample space may be X(s)=s, in which case the outcomes of
the experiment are mapped into the integers 1,...,6,‘or, perhaps, X(s)=5s°,
in which case the possible outcomes are mapped into the integers
{1,4. 9, 16, 25, 36}. These are examples of discrete random variables.
Although we have used as examples experiments that have a finite set of
possible outcomes, there are many physical systems (experiments) that
generate continuous outputs (outcomes). For example, the noise voltage
generated by an electronic amplifier has a continuous amplitude. Conse-
quently, the sample space S of voltage amplitudes v € S is continuous and so is
the mapping X(v) =v. In such a case, the random variablet X is said to be a
continuous random variable.
Given a random variable X, let us consider the event {X =0. When the
random variable is discrete or of a mixed type, the pdf contains impulses at the
points of discontinuity of F(x). In such cases, the discrete part of p(x) may be
expressed as
peny= 3 P(X =x) 5 —x) (2-1-2)
is
where x, /=1,2,...,”, are the possible discrete values of the random
‘An example of the cumulative distribution
function of a random variable of a mixed type.CHAPTER 2: PROBABILITY AND STOCHASTIC PROCESSES 25,
variable; P(X =x,), i=1,2,...,a, are the probabilities, and 6(x) denotes an
impulse at x = 0.
Often we are faced with the problem of determining the probability that a
random variable X falls in an interval (x,, x), where x2 >x,. To determine the
probability of this event, let us begin with the event {X OK) 2-1-31
F(x;) — F(x, — Ax2) ( )
Assuming that the pdfs p(x,, x2) and p(x2) are continuous functions over the
interval (x2 — Ax2, x2), we may divide both numerator and denominator in
(2-1-31) by 4x. and take the limit as Ax. 0. Thus we obtain
OF (X1, X2)/dx.
OF (x2)/dx2
— OP Fe Pur, tte) die du,]/dx2
O[f2. p(u2) dua]/axz
_ St p(ur, X2) du,
P(x2)
P(X, Sx, | X_= x2) = Fx, |) =
(2-1-32)
which is the conditional cdf of the random variable X, given the random
variable X2. We observe that F(~«|x,)=0 and F(~|x.)=1. By
differentiating (2-1-32) with respect to x,, we obtain the corresponding pdf
p(x; |x2) in the form
P(t1, X2)
P(%2)
Altematively, we may express the joint pdf p(x,,x.) in terms of the
conditional pdfs, p(x, | x2) or p(x2 { x1), as
P(x, | x2)= (2-1-33)
P(r, x2) = P(X | x2)p (er)
= p(x? | x1)p (x) (2-1-34)
The extension of the relations given above to multidimensional random
variables is also easily accomplished. Beginning with the joint pdf of the
random variables X;, i =1,2,...,, we may write
P(X 15 X00 Sa) = (Lt, Kaye Me | Keay Me Pleats Xn) (2-1-35)
where k is any integer in the range 1 + duty (2-1-36)
PR kev Xn28 DIGITAL COMMUNICATIONS
This conditional cdf satisfies the properties previously established for these
functions, such as
F (2, X25 He Kee ts Mn) = Pty Xap te | Meee Mn)
F(~®, 82,00 Xe | Meet ke) =O
Statistically Independent Random Variables. We have already defined
statistical independence of two or more events of a sample space S. The
concept of statistical independence can be extended to random variables
defined on a sample space generated by 2 combined experiment or by repeated
trials of a single experiment. If the experiments result in mutually exclusive
outcomes, the probability of an outcome in one experiment is independent of
an outcome in any other experiment. That is, the joint probability of the
outcomes factors into a product of the probabilities corresponding to each
outcome. Consequently, the random variables corresponding to the outcomes
in these experiments are independent in the sense that their joint pdf factors
into a product of marginal pdfs. Hence the multidimensional random variables
are statistically independent if and only if
F(X), X25.) Xn) = FOC) F(x) ++ Fen) (2-1-37)
or, alternatively,
PR Bay He) = pl )p(%2) * * PEn) (2-4-38)
2-1-2 Functions of Random Variables
A problem that arises frequently in practical applications of probability is the
following. Given a random variable X, which is characterized by its pdf p(x),
determine the pdf of the random variable Y = g(X), where g(X) is some given
function of X. When the mapping g from X to Y is one-to-one, the
determination of p(y) is relatively straightforward. However, when the
mapping is not one-to-one, as is the case, for example, when Y = X?, we must
be very careful in our derivation of p(y).
Example 2-1-1
Consider the random variable Y defined as
Y=aX+b (2-1-39)
where a and b are constants, We assume that a > 0. If a <0, the approach is
similar (see Problem 2-3). We note that this mapping, illustrated in Fig.
2-1-4(a) is linear and monotonic. Let Fy(x) and F;( y) denote the cdfs for X
and Y, respectively.t Then
Fy) = PLY Sy) = Plax +60 \
(a)
Priv)
FIGURE 2-1-4 A linear transformation of a random variable X and an example of the corresponding pdfs of
and Y.
By differentiating (2-1-40) with respect to y, we obtain the relationship
between the respective pdfs. It is
1 -b
Py(y) =i ex” a ) (2-1-41)
Thus (2-1-40) and (2-1-41) specify the cdf and pdf of the random variable Y
in terms of the cdf and pdf of the random variable X for the linear
transformation in (2-1-39). To illustrate this mapping for a specific pdf
Px(x), consider the one shown in Fig. 2-1-4(b). The pdf py(y) that results
from the mapping in (2-1-39) is shown in Fig. 2-1-4(c).
Example 2-1-2
Consider the random variable Y defined as
Y=aX?+b, a>0 (2-1-42)
As in Example 2-1-1, the mapping between X and Y is one-to-one. Hence
Fy) = P(Y 0 (2-1-45)
In contrast to Examples 2-{-1 and 2-1-2, the mapping between X and Y,
illustrated in Fig. 2-1-5, is not one-to-one. To determine the cdf of Y, we
observe that
Fry) = P(Y Sy) = P(aX? +b by yp x2 = Dd by yj... =D Pn Jana
pax 2 Yi X2 2 25 Yi x, 2 uy} idet Al
(2-1-60)CHAPTER 2 PROBABILITY AND STOCHASTIC PROCESSES 33
2-1-3 Statistical Averages of Random Variables
Averages play an important role in the characterization of the outcomes of
experiments and the random variables defined on the sample space of the
experiments. Of particular interest are the first and second moments of a single
random variable and the joint moments, such as the correlation and covari-
ance, between any pair of random variables in a multidimensional set of
random variables. Also of great importance are the characteristic function for a
single random variable and the joint characteristic function for a multidimen-
sional set of random variables. This section is devoted to the definition of these
important statistical averages.
First we consider a single random variable X characterized by its pdf p(x).
The mean or expected value of X is defined as
E(x) =m, =| xp(e)de (2.1-61)
where E( ) denotes expectation (statistical averaging). This is the first moment
of the random variable X. In general, the nth moment is defined as
E(X")= [ X"p(x) dx (2-1-62)
Now, suppose that we define a random variable Y = g(X), where g(X) is
some arbitrary function of the random variable X. The expected value of Y is
E(Y)= Ele(X)]= | gta)pte) dr (21.83)
In particular, if Y = (X — m,)" where m, is the mean value of X, then
E(Y) = BUX ~m.)"I= [te mY p(x) dx (241-64)
This expected value is called the nth central moment of the random variable X,
because it is a moment taken relative to the mean. When n =2, the central
moment is called the variance of the random variable and denoted as on.
That is,
oi= | = m,iptadr 21-45)
This parameter provides a measure of the dispersion of the random variable X.
By expanding the term (x — m,)’ in the integral of (2-1-65) and noting that the
expected value of a constant is equal to the constant, we obtain the expression
that relates the variance to the first and second moments, namely,
of = E(X’) - [E(X)P
= E(X?) -— m? (2-1-66)340 Diarra, COMMUNICATIONS
In the case of two random variables, X, and X2, with joint pdf p(x,, x2), we
define the joint moment as
© sodxg= [of tspty nan de (21-67)
and the joint central moment as
EO, mK ~ ms)"
= [fers mtea—maypte, 2) desde (21-68)
where m, = E(X;). Of particular importance to us are the joint moment and
joint central moment corresponding to k=n=1. These joint moments are
called the cofrelation and the covariance of the random variables X, and X2,
respectively.
In considering multidimensional random variables, we can define joint
moments of any order. However, the moments that are most useful in practical
applications are the correlations and covariances between pairs of random
variables. To elaborate, suppose that X,, i= 1,2,...,m, are random variables
with joint pdf p(x,,x2,..., Xn). Let p(x, x,;) be the joint pdf of the random
variables X, and X;. Then the correlation between X; and X, is given by the
joint moment
6G) = [fF xapta x) dd, 2-169)
and the covariance of X; and X, is
Hy = E[(X; — m,)(X; ~ m,)}
= [fee mice mote, 3) aay
= i I XX; (%;, Xj) dx, dx; — mum;
= E(X,X)) - mim, (2-1-70)
The n Xn matrix with elements u, is called the covariance matrix of the
random variables X,, i=1,2,..., 7. We shall encounter the covariance matrix
in our discussion of jointly gaussian random variables in Section 2-1-4.
Two random variables are said to be uncorrelated if E(X,X)) =
E(X,)E(X;) = mm. In that case, the covariance 4, = 0. We note that when X;
and X; are statistically independent, they are aiso uncorrelated. However, if X;
and X; are uncorrelated, they are not necessarily statistically independent.CHAPTER 2: PROBABILITY AND STOCHASTIC PROCESSES 35
Two random variables are said to be orthogonal if E(X;X;}=0. We note
that this condition holds when X; and X; are uncorrelated and either one or
both of the random variables have zero mean.
Characteristic Functions The characteristic function of a random variable
X is defined as the statistical average
E(e)=u(n)= [pear (1-71)
where the variable v is real and j = V—1. We note that ¢(jv) may be described
as the Fourier transformt of the pdf p(x). Hence the inverse Fourier trans-
form is
poy=f wUvye"** du (2-1-72)
One useful property of the characteristic function is its relation to the
moments of the random variable. We note that the first derivative of (2-1-71)
with respect to v yields
dw) *
He) xe p(x) de
dy oe
By evaluating the derivative at v = 0, we obtain the first moment (mean)
Avr)
dv
E(X) =m, =
(2-1-73)
v=0
The differentiation process can be repeated, so that the nth derivative of p(jv)
evaluated at v =0 yields the nth moment
Bon = (yp HD
de" (2-1-74)
ve
Thus the moments of a random variable can be determined from the
characteristic function. On the other hand, suppose that the characteristic
function can be expanded in a Taylor series about the point v = 0. That is,
= faut) .
oGo)= & (| (244-75)
Using the relation in (2-1-74) to eliminate the derivative in (2-1-75), we obtain
t Usually the Fourier transform of a function g(u) is defined as G(v) = [7. g(u)e“”™ du, which
Giffers from (2-1-71) by the negative sign in the exponential. This is a trivial difference, however, so
we call the integral in (2-1-71) a Fourier transform.36 DIGHTAL COMMUNICATIONS
an expression for the characteristic function in terms of its moments in the
form
= ivy"
wy = ¥ Bue) EE (244.76)
no !
The characteristic function provides a simple method for determining the
pdf of a sum of statistically independent random variables. To illustrate this
point, let ¥,, /=1,2,...,n, be a set of n statistically independent random
variables and let
y=Sx, (2-1-77)
A
The problem is to determine the pdf of Y. We shall determine the pdf of Y by
first finding its characteristic function and then computing the inverse Fourier
transform. Thus ’
bv) = Ee”)
= t[on w S x)]
=e[ Mee]
mt
-{ . a (I 2 pry Xa tn) day dey dx, (21-78)
x ae
Since the random variables are statistically independent, p(x), x2,..-, 9) =
P(x;)p(%2)- ++ p(x,). and, hence, the nth-order integral in (2-1-78) reduces to a
product of » single integrals, each corresponding to the characteristic function
of one of the X,. Hence,
by Gu) = T dy, (v) (2-1-79)
If, in addition to their statistical independence, the , are identically
distributed then all the y,(jv) are identical. Consequently,
dy (iv) = [bx (ie)]" (2-1-80)
Finally, the pdf of Y is determined from the inverse Fourier transform of
yCjv), given by (2-1-72), :
Since the characteristic function of the sum of n statistically independent
random variables is equal to the product of the characteristic functions of the
individual random variables X,, i =1,2,...,n, it follows that, in the transform
domain, the pdf of Y is the n-fold convolution of the pdfs of the X,. Usually
the n-fold convolution is more difficult to perform than the characteristic
function method described above in determining the pdf of Y.
When working with n-dimensional random variables, it is appropriate to
define and n-dimensional Fourier transform of the joint pdf. In particular, ifCHAPTER 2; PROBABILITY AND STOCHASTIC PROCESSES 37
X, i=1,2,...,m, are random variables with pdf p(x, x2,...,%,), the
n-dimensional characteristic function is defined as
WM, jPa- ++ sn)
= Elexp (73 vx.)
aml
=| . f exp (j 3) vas)pteiste-rtn)dey dg +-de, 2-1-81)
a i i=
Of special interest is the two-dimensional characteristic function
vgn) [fetta rd, dts 21-82)
We observe that the partial derivatives of (jv, jv2) with respect to v, and v;
can be used to generate the joint moments. For example, it is easy to show that
FP piv, iva)
EX) = ay
1 v2
(2-1-83)
y= v2=0
Higher-order moments are generated in a straightforward manner.
2-1-4 Some Useful Probability Distributions
In subsequent chapters, we shall encounter several different types of random
variables. In this section we list these frequently encountered random
variables, their pdfs, their cdfs, and their moments. We begin with the binomial
distribution, which is the distribution of a discrete random variable, and then
we present the distributions of several continuous random variables.
Binomial Distribution Let X be a discrete random variable that has two
possible values, say X =1 or X=0, with probabilities p and 1 —p,
respectively. The pdf of X is shown in Fig. 2-1-6. Now, suppose that
y=>%,
where the X,, i=1,2,...,, are statistically independent and identically
FIGURE 2-1-6 The probability distribution function of X. 0 1 *38) DictTAL ComMuNICATIONS
distributed random variables with the pdf shown in Fig. 2-1-6. What is the
probability distribution function of Y?
To answer this question, we observe that the range of Y is the set of
integers from 0 to n. The probability that Y = 0 is simply the probability that
all the X; = 0. Since the X, are statistically independent,
PY =0)=(1 py"
The probability that Y = 1 is simply the probability that one X, = 1 and the rest
of the X, = 0. Since this event can occur in different ways,
P(Y =1)=np(t~ py"!
To generalize, the probability that ¥ = is the probability that k of the X, are
equal to one and n — k are equal to zero, Since there are
(") = (2-1-84
k) kia —kt 1-84)
different combinations that result in the event {Y = k}, it follows that
por =k)=(T)ps(- py (241-85)
where 7) is the binomial coefficient. Consequently, the pdf of Y may be
expressed as
Py)= & PCY =k) Bly — k)
a(n) ne
=> (Jena - py'* 809 ~ kp (24-86)
The cdf of Y is
F(y)= P(¥ (jpta-pr* (21-87)
where [y] denotes the largest integer m such that m m,, the complementary error function is
Proportional to the area under the tail of the gaussian pdf. For large values of
x, the complementary error function erfc(x) may be approximated by the
asymptotic series
1 (1:3. 1-3-5
(-set ae t) (21-96)
where the approximation error is less than the last term used.
The function that is frequently used for the area under the tail of the
gaussian pdf is denoted by Q(x) and defined as
1 ee
=— e 2 -1-97
Ox) zl ee? dy x20 (2-1-97)
erfe (x) =
XN
By comparing (2-1-95) with (2-1-97), we find
O(x) = Serfe (3) (2-1-98)CHAPIER 2 PROBABILITY AND STOCHASTIC Processes 4
The characteristic function of a gaussian random variable with mean m, and
variance «is
* 1 der?
anim fem [eee tome
= glen U2 (2-1-99)
The central moments of a gavssian random variable are
1-3---(k-1)o* (evenk)
= ‘J=u,= -1-100)
E(X = m= a = toad) i100)
and the ordinary moments may be expressed in terms of the central moments
as
k
E(X)=> (Ponta. (2-1-101)
10
The sum of x statistically independent gaussian random variables is also a
gaussian random variable. To demonstrate this point, let
fn
Y=> x, (2-1-102)
where the X;, i=1,2,...,m, are statistically independent gaussian random
variables with means m, and variances 7. Using the result in (2-1-79), we find
that the characteristic function of Y is
vr(i0) = FT wat)
= I eivm vain
aT
= elm vei (2-1-103)
where
m= > m,
“ (2-1-104)
a= Do?
it
Therefore, Y is gaussian-distributed with mean m, and variance o.
Chi-Square Distribution A chi-square-distributed random variable is re-
lated to a gaussian-distributed random variable in the sense that the former can
be viewed as a transformation of the latter. To be specific, let Y = X?, where XY
is a gaussian random variable. Then Y has a chi-square distribution. We
distinguish between two types of chi-square distributions. The first is called a42 DIGIAL COMMUNICATIONS
central chi-square distribution and is obtained when X has zero mean. The
second is called a non-central chi-square distribution, and is obtained when X
has a nonzero mean.
First we consider the central chi-square distribution. Let X be gaussian-
distributed with zero mean and variance o”. Since Y = X?, the result given in
(2-1-47) applies directly with a = i and b = 0. Thus we obtain the pdf of Y in
the form
1
Pry) Vino
e'P" yO (2-1-105)
The cdf of Y is
Fy) I Py(u) du
0
Lod :
“veh ae (2-1-106)
f
which cannot be expressed in closed form. The characteristic function,
however, can be determined in closed form. It is
1
wy) = ~ 1-107
WY) = Gaia (2-1-107)
Now, suppose that the random variable Y is defined as
Y=> x? (2-1-108)
ma
where the X,, i=1,2,...,a, are statistically independent and identically
distributed gaussian random variables with zero mean and variance o”. As a
consequence of the statistical independence of the X,, the characteristic
function of Y is
oe 1
Wi) = Fah (2-1-109)
The inverse transform of this characteristic function yields the pdf
= 1 wi2~1,-¥i2o"
PM) Soap ys en, yO (241-110)
where [(p) is the gamma function. defined as
T(p) -| Pole 'd, p>o
0
(2-1-111)
I(p)=(p- Ut. pan ititeger, p > 0
TQ)=Vz, TG)=}va
This pdf, which is a generalization of (2-1-105), is called a chi-square (orFIGURE 2-1-9
CHAPTER 2: PROBABILITY AND STOCHASTIC PROCESSES 43
The pdf of a chi-square-distributed random.
variable for several degrees of freedom.
gamma) pdf with n degrees of freedom. It is illustrated in Fig. 2-1-9. The case
n= 2 yields the exponential distribution.
The first two moments of Y are
E(Y) = no?
E(¥’) = Ino* + n?a* (2-1-112)
a3 = 2no*
The cdf of Y is
= . 1 AID= iy ~ulda?
Fy) [ rrr in)" e du, y20 (2-1-113)
This integral can be easily manipulated into the form of the incomplete gamma
function, which is tabulated by Pearson (1965). When a is even, the integral in
(2-1-113) can be expressed in closed form. Specifically, let m= 4n, where m is
an integer. Then, by repeated integration by parts, we obtain
mt ‘
Foy) = 1-27 S 4(%5) , yO (2-1-114)
Kao kK!
Let us now consider a noncentral chi-square distribution, which results from
squaring a gaussian random variable having a nonzero mean. If X is gaussian
with mean m, and variance a’, the random variable Y = X? has the pdf
Pry) =
%
en mti20 cosh (St, y20 (21-115)
o
1
Via o
which is obtained by applying the result in (2-1-47) to the gaussian pdf given by
(2-1-92). The characteristic function corresponding to this pdf is
vy Gu) = ac aes ermbutit ~ 72v0%) (2-1-116)
aia44 DIGtTAL COMMUNICATIONS
To generalize these results, let Y be the sum of squares of gaussian random
variables as defined by (2-1-108). The X,, i=1,2,...,n, are assumed to be
Statistically independent with means m,, i= 1,2,...,n, and identical variances
equal to o*. Then the characteristic function of Y, obtained from (2-1-116) by
applying the relation in (2-t-79), is
jv 3 mi
dy Gv) = (2-1-117)
(= pave? *P \T ave?
This characteristic function can be inverse-Fourier-transformed to yield the pdf
" ~2y4
1 ta yytet
Pry) =>3(5 erry, (vys) y=0 (2-111)
207 \s' o
where, by definition,
s7= Dm? (2-1-119)
fl
and [,(x) is the ath-order modified Bessel function of the first kind, which may
be represented by the infinite series
= (4/2)214
1,(«) = > > “1
a(x) Zite+ho0" x20 (2-1-120)
The pdf given by (2-1-118) is called the noncentral chi-square pdf with n
degrees of freedom. The parameter s? is called the noncentrality parameter of
the distribution.
The cdf of the noncentral chi square with n degrees of freedom is
~L pyyerre .
#)= ['55(4) ee 02° Iyaa( Va) de (2-1-121)
There is no closed-form expression. for this integral. However, when m = in is
an integer, the cdf can be expressed in terms of the generalized Marcum’s Q
function, which is defined as
Q,n(a, b) a dx
tape mai b\k
=O1(a, b) + err? > (?) I,{ab) (2-1-122)
kar \@
where
renew (a)
O(a, b) =e 72 Y (3) i,(ab), b>a>0 (2-1-123)
Ko \b
If we change the variable of integration in (2-1-121) from u to x, where
x?=ufe?CHAPTER 2: PROBABILITY AND STOCHASTIC PROCESSES 45
and let a? = s*/o?, then it is easily shown that
Vi
Fy)=1- on(£,~2) (2-1-124)
o
Finally, we state that the first two moments of a noncentral chi-square-
distributed random variable are
E(¥)= no? +s?
E(¥?) = not + 4a?s? + (no? + 527 (2-4-125)
o3 = 2no* + 4a?
Rayleigh Distribution The Rayleigh distribution is frequently used to
model the statistics of signals transmitted through radio channels such as
cellular radio. This distribution is closely related to the central chi-square
distribution. To illustrate this point, let Y =.X3+X3 where X, and X; are
zero-mean statistically independent gaussian random variables, each having a
variance a”. From the discussion above, it follows that Y is chi-square-
distributed with two degrees of freedom. Hence, the pdf of Y is
1 a
pry)= sae, y=0 (2-1-126)
Now, suppose we define a new random variable
R=VXi+ Xi= VY (2-1-127)
Making a simple change of variable in the pdf of (2-1-126), we obtain the pdf
of R in the form
r>=0 (2-1-128)
This is the pdf of a Rayleigh-distributed random variable. The corresponding
cdf is
Fair)= [ ee au
=1-e7"?" pg (2-1-129)
The moments of R are
E(R*) = (207)°P(1 + 4k) (2-1-130)46 ait TAL COMMUNICATIONS
and the variance is
o2=(2-4a)o? (2-1-131)
The characteristic function of the Rayleigh-distributed, random variable is
Wel jv) = I ere ar (2-4-132)
9 oT
This integral may be expressed as
Wr(jv) = f °°" cos ur dr + i [3 e072 sin ur dr
* bo lo
=A, $: — ve?) + jVinvore ""? (2-1-133)
where ,F,{1, 3; —a) is the confluent hypergeometric function, which is defined
as
voy. we Dla + KE (B)x*
Fla Bix)= Tah (B+ OK
ko.
B#0,-1,-2,... (2-1-134)
Beaulieu (1990) has shown that , F,(1, $; ~@) may be expressed as
A(1, 3; ~a) = -e7* (2-1-135)
k0 (2k — 1)kI
As a generalization of the above expression, consider the random variable
R=,/3 x? (21-136)
where the X;, i=1,2,...,, are statistically independent, identically distrib-
uted zero mean gaussian random variables. The random variable R has a
generalized Rayleigh distribution. Clearly, Y=? is chi-square-distributed
with m degrees of freedom. Its pdf is given by (2-1-110). A simple change in
variable in (2-1-110) yields the pdf of R in the form
rt et
PR(r)= 2 Dagny” rns 120 (2-1-137)
As a consequence of the functional relationship between the central
chi-square and the Rayleigh distributions, the corresponding cdfs are’ similar.
Thus, for any n, the cdf of R can be put in the form of the incomplete gamma
function. In the special case when n is even, i.e; m = 2m, the cdf of R can be
expressed in the closed form
m-i 1 r ik
Fy(r) = 1-72" 5) ZS). r=0 (2-1-138)CHAPTER 2; PROBABILITY AND STOCHASTIC PRocEsses 47
Finally, we state that the kth moment of R'is
TG@ +k)
EUR) = 007?
> k20 (2-1-1339)
which holds for any integer 7.
Rice Distribution Just as the Rayteigh distribution is related to the central
chi-square distribution, the Rice distribution is related to the noncentral
chi-square distribution. To illustrate this relation, let ¥Y = X}+ X3, where X,
and X2 are statistically independent gaussian random variables with means m,,
i=1,2, and common variance o”. From the previous discussion, we know that
Y has a noncentral chi-square distribution with noncentrality parameter
s? =m} + m3. The pdf of Y, obtained from (2-1-118) for n = 2, is
1 teyyze? Ss
privi=zze OP"h(VIS), y20 (21-140)
Now, we define a new random variable R = VY. The pdf of R, obtained
from (2-1-140) by a simple change of variable, is
Putas? rs
Pair) =e a mani( 5), r>0 (2-1-141)
This is the pdf of a Ricean-distributed random variable. As will be shown in
Chapter 5, this pdf characterizes the statistics of the envelope of a signal
corrupted by additive narrowband gaussian noise. It is also used to mode! the
signal statistics of signals transmitted through some radio channels. The cdf of
R is easily obtained by specializing the results in (2-1-124) to the case m = 1.
This yields
Fe(r) = 1 - a(:, ), r>0 (2-1-142)
where Q,(a, b) is defined by (2-1-123).
As a generalization of the expressions given above, let R be defined as in
(2-1-136) where the X,, i=1,2,...,” are statistically independent gaussian
random variables with means m,, i= 1,2,...,m, and identical variances equal
to 0. The random variable R? = Y has a noncentral chi-square distribution
with n degrees of freedom and noncentrality parameter s? given by (2-1-119).
Its pdf is given by (2-1-118). Hence the pdf of R is
Patt) = arma Ian sf
720 (2-1-143)48 ictal COMMUNICATIONS
and the corresponding cdf is
Fy(r) = P(R Sr) = P(VY Sr) = PY 1, the tail of the pdf
decays faster than that of the Rayleigh. Figure 2-1-10 illustrates the pdfs for
different values of m.
Multivariate Gaussian Distribution Of the many multivariate or multi-
dimensional distributions that can be defined, the multivariate gaussian
distribution is the most important and the one most likely to be encountered in
practice. We shalt briefly introduce this distribution and state its basic
properties.
Let us assume that X;, i= 1,2,...,n, are gaussian random variables with
means m,, i=1,2,...,n, variances o?, i=1,2,...,n, and covariances Bip
i,j=1,2,...,n. Clearly, w,= 07, i=1,2,...,”. Let M denote the n XnSO diGttat COMMUNICATIONS
covariance matrix with elements {y,,}, let X denote the n x 1 column vector of
random variables, and let m, denote the 1 X 1 column vector of mean values
m, i=1,2,...,a. The joint pdf of the gaussian random variables X,,
i=1,2,...,x, is defined as
1
P(t. X20 6 tn) = (my (der myer? [2 ~m,)'M ‘(x —m,)}
(2-1-150)
where M' denotes the inverse of M and x’ denotes the transpose of x.
The characteristic function corresponding to this n-dimensional joint pdf is
diy) = Ee”)
where v is an n-dimensional vector with elements v,, i=1,2,...,”.
Evaluation of this n-dimensional Fourier transform yields the result
wiv) = exp (jm,v — tv’My) (2-1-151)
An important special case of (2-1-150) is the bivariate or two-dimensional
gaussian pdf. The mean m, and the covariance matrix M for this case are
2
m, = ("| M= | “ ae] (2-1-152)
m2 Biz O%
where the joint central moment y,2 is defined as
Hi = E[(X, — m)(X2 — m2)]
It is convenient to define a normalized covariance
a Hy
oo,"
py ixj (21-153)
where p, satisfies the condition 0<{p,;|<1. When dealing with the two-
dimensional case, it is customary to drop the subscripts on y,, and p,7. Hence
the covariance matrix is expressed as
2
m=| a pores] 2-1-154
po, o} ¢ )
Its inverse is
1 oa —poyo.
Mole [ 2 1 "| 21-
aol pAl-poer of 1155)CHAPTER 2, PROBABILITY AND STOCHASTIC PROCESSES S¥
and det M = o703(1 — p?). Substitution for M~' into (2-1-150) yields the
desired bivariate gaussian pdf in the form
P(X, x2) =
2 2 2 2
_ O2(t, — MY — 2payaAx1 — Mi)2 — M2) + 712 = M2) |
xer| 2otoX(l — 6°)
(2-1-156)
We note that when p =0, the joint pdf p(x,, x2) in (21-156) factors into the
product p(x,)p(x2), where p(x;), i=1, 2, are the marginal pdfs. Since p is a
measure of the correlation between X, and X>, we have shown that when the
gaussian random variables X, and X, are uncorrelated, they are also
statistically independent. This is an important property of gaussian random
variables, which does not hold in general for other distributions. It extends to
n-dimensional gaussian random variables in a straightforward manner. That is,
if pj = 0 for i *j then the random variables X;, i= 1,2,...,n are uncorrelated
and, hence, statistically independent.
Now, let us consider a linear transformation of n gaussian random variables
X;, i =1,2,...,n, with mean vector m, and covariance matrix M. Let
Y=AX (2-1-157)
where A is a nonsingular matrix. As shown previously, the jacobian of this
transformation is J =1/det A. Since X=A7~'Y, we may substitute for X in
(2-1-150) and, thus, we obtain the joint pdf of Y in the form
1
(@ay*(det My? aera RPI HA Ty m,)'M>'(A~'y ~ m,)]
p(y) =
1
~ Gay (der Qt xP [BY = MOY ~m,)] (2-1-158)
where the vector m, and the matrix Q are defined as
m= Am, (2-1-159)
Q=AMA
Thus we have shown that a linear transformation of a set of jointly gaussian
random variabies results in another set of jointly gaussian random variables.
Suppose that we wish to perform a linear transformation that results in 1
statistically independent gaussian random variables. How should the matrix A
be selected? From our previous discussion, we know that the gaussian random520 DIGITAL COMMUNICATIONS
variables are statistically independent if they are pairwise-uncorrelated, i.e., if
the covariance matrix Q is diagonal. Therefore, we must have
AMA’ =D (2-1-160)
where D is a diagonal matrix. The matrix M is a covariance matrix; hence, it is
positive definite. One solution is to select A to be an orthogonal matrix
(A’= A“) consisting of columns that are the eigenvectors of the covariance
matrix M. Then D is a diagonal matrix with diagonal elements equal to the
eigenvalues of M.
Example 2-1-5
Consider the bivariate gaussian pdf with covariance matrix
13
wy]
= ha
Let us determine the transformation A that will result in uncorrelated
random variables. First, we solve for the eigenvalues of M. The characteris-
tic equation is
det (M — AI) = 0
(i-ay-t=0
Anat
Next we determine the two eigenvectors. If a denotes an eigenvector, we
have
(M~ADa=0
With A, = 3 and A, =}, we obtain the eigenvectors
oh ei
a-vil! ‘]
1-1
Therefore,
It is easily verified that A~’ = A’ and that
AMA'=D
where the diagonal elements of D are 3 and }.CHAPTER 2: PROBABILITY AND STOCHASTIC PROCESSES 53,
2-1-5 Upper Bounds on the Tail Probability
In evaluating the performance of a digital communication system, it is often
necessary to determine the area under the tail of the pdf. We refer to this area
as the tail probability. In this section, we present two upper bounds on the tail
probability. The first, obtained from the Chebyshev inequality, is rather loose.
The second, called the Chernoff bound, is much tighter.
Chebyshev Inequality Suppose that X is an arbitrary random variable with
finite mean m, and finite variance 2. For any positive number 5,
2
PUX ~m,|>8)< 5 (2-1-161)
This relation is called the Chebyshev inequality. The proof of this bound is
relatively simple. We have
’
beg
; (x ~m,)’p(x) de
al p(x) dx = 8° P(X — m,| > 8)
aie
Thus the validity of the inequality is established.
It is apparent that the Chebyshev inequality is simply an upper bound on
the area under the tails of the pdf p(y), where Y =X —m,, i.e., the area of
p(y) in the intervals (—%, —8) and (5, %). Hence, the Chebyshev inequality
may be expressed as
2
1- [F8)— F-B)< (2-1-162)
or, equivalently, as
2
1- [Fy(on, + 8) — Fy(m, ~ 8)] <2 (2-1-163)
There is another way to view the Chebyshev bound. Working with the zero
mean random variable Y= X —m,, for convenience, suppose we define a
function g(Y) as
1 (y/>6
soya {h no
0 (W¥I<68) (21164)
Since g(Y) is either 0 or 1 with probabilities P(Y|<5) and P(¥|>8),
Tespectively, its mean value is
Elg(¥)] = PUY| > 8) (2-1-165)FIGURE 2-41-11
54> DIGITAL COMMUNICATIONS
A quadratic upper bound on g(Y) used in
obtaining the tail probability (Chebyshev
bound).
Now suppose that we upper-bound g(Y) by the quadratic (Y/8)”, i.e.,
y 2
ar<(5) (2-1-166)
The graph of g(Y) and the upper bound are shown in Fig, 2-1-11. It follows
that
y? E y? 2 2
aeonee(g) EP ad
Since E(g(¥)] is the tail probability, as seen from (2-1-165), we have obtained
the Chebyshev bound.
For many practical applications, the Chebyshev bound is extremely loose.
The reason for this may be attributed to the looseness of the quadratic (Y/8)
in overbounding g(¥). There are certainly many other functions that can be
used to overbound g(¥). Below, we use an exponential bound to derive an
upper bound on the tail probability that is extremely tight,
Chernoff Bound The Chebyshev bound given above involves the area
under the two tails of the pdf. In some applications we are interested only in
the area under one tail, either in the interval (5, ©) or in the interval (~~, 5).
In such a case we can obtain an extremely tight upper bound by overbounding
the function g(Y) by an exponential having a parameter that can be optimized
to yield as tight an upper bound as possible. Specifically, we consider the tail
probability in the interval (4, »). The function g(¥) is overbounded as
g(Y) serra) (2-1-167)
where g(Y) is now defined as
_ fl (v=8)
an={i (Y<6) (2-1-168)CHAPTER 2: PROBABILITY AND STOCHASTIC Processes 55
FIGURE 2-1-2 An exponential upper bound on g(Y) used in
obtaining the tail probability (Chernoff bound).
and v>O is the parameter to be optimized. The graph of g(Y) and the
exponential upper bound are shown in Fig. 2-1-12.
The expected value of g{Y) is
Elg(Y)] = P(Y > 8) < E(e""~) (2-1-169)
This bound is valid for any v0. The tightest upper bound is obtained by
selecting the value of v that minimizes E(e“”—®)). A necessary condition for a
minimum is
4 E(e""-®)=0 (2-1-170)
But the order of differentiation and expectation can be interchanged, so that
d d
= vuy-8)) — pl © ,wr-3)
dv Ee ) eS, é )
= E[(¥ — 8)e" 9]
=e" “LE(Ye"") — 5E(e"")}=0
Therefore the value of v that gives the tightest upper bound is the solution to
the equation
E(Ye'") ~ 5E(e*Y) =0 (2-1-171)
Let 9 be the solution of (2-1-171). Then, from (2-1-169), the upper bound on
the one-sided tail probability is
P(Y 28) > 1, (2-1-179) reduces to
p(y >ay 8)
where, by definition, 5,,=m, +6. Since 6>0, it follows that 5, > m,. Let
8(X) be defined as
_fl (X26,)
a(X)= {5 (X<8,) (2-1-181)
and upper-bounded as
B(X) se" -) (2-1-182)
From this point, the derivation parallels the steps contained in (2-1-169)-
(2-1-172). The final result is
P(X > 6,,) m, and 9 is the solution to the equation
E(Xe*) - 5,,E(e"*) =0 (2-1-184)
In a similar manner, we can obtain the Chernoff bound for the lower tail
probability. For 6 <0, we have
P(X — m, <8) = P(X 5)=0 (2-1-189)
ane amt
Therefore, the probability that the estimate of the mean differs from the true
mean m, by more than & (6 > 0) approaches zero as n approaches infinity. This
statement is a form of the law of large numbers. Since the upper bound
converges to zero relatively slowly, i.e., inversely with n, the expression in
(2-1-188) is called the weak law of large numbers.
The Chernoff bound applied to the random variable Y yields an exponential
dependence of n, and thus provides a tighter upper bound on the one-sided tail
probability. Following the procedure developed in Section 2-1-5, we can
determine that the tail probability for y is
P(Y -m,>5)=P 1S ym, 265
ly nm
= ap X, =n8,,) = Elexp [>(
where 6,,=m,+6 and 6>0. But the X,, i=1,2,...,n, are Statistically
independent and identically distributed. Hence,
Elexp [( Xi- rn) } = ee exp (v 3 x)
fst
3x, -n6,,) |} (2-1-190)
i=t
net] Ee)
ant
= [eE(e*}]" (2-1-191)
where X denotes any one of the X,. The parameter v that yields the tightest
upper bound is obtained by differentiating (2-1-191) and setting the derivative
equal to zero. This yields the equation
E(Xe*) ~ 5,E(e"*) =0 (2-1-192)© DIGITAL COMMUNICATIONS
Let the solution of (2-1-192) be denoted by %. Then, the bound on the upper
tail probability is
(= x2 bn Ke E(e*))", 5, > my (2-1-1493)
it
Ina similar manner, we find that the lower tail probability is upper-bounded as
P(Y <6,,)<[e7 "E(e™*)]", 8, < my (2-1-194)
where ? is the solution to (2-1-192).
Example 2-1-7
Let X;, i=1,2,...,n, bea set of statistically independent random variables
defined as
x ={ 1 with probability p <4
‘(-1 — with probability 1 - p
We wish to determine a tight upper bound on the probability that the sum
of the X; is greater than zero. Since p <4, we note that the sum will have a
negative value for the mean; hence we seek the upper tail probability. With
6,, = 0 in (2-1-193), we have
P( px 0) <(E(e)" (2-1-195)
it
where ¢ is the solution to the equation
E(Xe"*) =0 (2-1-196)
Now
E(Xe’*) = —(1- p)e-’ + pe’ =0
Hence
i_
o=In ( <) (2-1-197)
Furthermore,
E(e*) = pe? + (1- pe?
Therefore the bound in (2-1-195) becomes
P(3 x0) <[pe'+(-pe Y
= ([4p(1- py"? (2-1-198)
We observe that the upper bound decays exponentially with n, as expected.CHAPTER 2: PROBABILITY AND STOCHASTIC PROCESSES 6
In contrast, if the Chebyshev bound were evaluated, the tai! probability
would decrease inversely with n.
Central Limit Theorem We conclude this section with an extremely useful
theorem concerning the cdf of a sum of random variables in the limit as the
number of terms in the sum approaches infinity. There are several versions of
this theorem. We shall prove the theorem for the case in which the random
variables X,, (= 1,2,...,”, being summed are statistically independent and
identically distributed, each having a finite mean m, and a finite variance o%.
For convenience, we define the normalized random variable
X;—m,
U,=———, i =1,2,...,0
oO,
Thus U; has a zero mean and unit variance, Now, let
12
Y=—=>dvy, 2-1-
Be (2-1-199)
Since each term in the sum has a zero mean and unit variance, it follows that
the normalized (by 1/Vn) random variable Y has zero mean and unit variance.
We wish to determine the cdf of Y in the limit as n > ~.
The characteristic function of Y is
we U,
i=1
Uy(ju) = Ele*") = E} exp Va
=Tle,{
~Tu(Z)
[of eam
where U denotes any of the U,, which are identically distributed. Now, let us
expand the characteristic function of U in a Taylor series. The expansion yields
u
VV Be 2 cry, ey
voli) T+) BW) > re) + AI BU) (2-1-201)
Since E(U) = 0 and E(U?) = 1, (2-1-201) simplifies to
- 2
wo) zt 2 Rn) (2-1-202)
where R(v,n)/n denotes the remainder. We note that R(v,n) approaches62 DIGITAL COMMUNICATIONS
zero as n -» %, Substitution of (2-1-202) into (2-1-200) yields the characteristic
function of Y in the form
dr) = [1-5 SOY (2-1-203)
Taking the natural logarithm of (2-1-203), we obtain
In dev) =n in| -£ Ren (2-1-204)
For smali values of x, In (1 +.) can be expanded in the power series
In(Qlit¢x)ex—dethr—..,
This expansion applied to (2-1-204) yields
2 2 2
in vv(ju) = n[ -2 Att) _1(-= 4 Rin) +} (2-1-205)
Finally, when we take the limit as n—><, (2-1-205) reduces to
lim,_.. In #y(jv) = ~4v?, or, equivalently,
lim py(jv) =e"? (2-1-206)
But, this is just the characteristic function of a gaussian random variable with
zero mean and unit variance. Thus we have the important result that the sum
of statistically independent and identically distributed random variables with
finite mean and variance approaches a gaussian cdf as n— =. This result is
known as the central limit theorem.
Although we assumed that the random variables in the sum are identically
distributed, the assumption can be relaxed provided that additional restrictions
are imposed on the properties of the random variables. There is one variation
of the theorem, for example, in which the assumption of identically distributed
random variables is abandoned in favor of a condition on the third absolute
moment of the random variables in the sum. For a discussion of this and other
variations of the central limit theorem, the reader is referred to the book by
Cramer (1946).
2.2. STOCHASTIC PROCESSES
Many of the random phenomena that occur in nature are functions of time.
For example, the meteorological phenomena such as the random fluctuations
in air temperature and air pressure are functions of time. The thermal noise
voltages generated in the resistors of an electronic device such as a radio
receiver are also’a function of time. Similarly, the signal at the output of a
source that generates information is characterized as a random signal thatCHAPTER 2; PROBABILITY AND STOCHASTIC PROCESSES 63,
varies with time. An audio signal that is transmitted over a telephone channel
is an example of such a signal. All these are examples of stochastic (random)
processes. In our study of digital communications, we encounter stochastic
processes in the characterization and modeling of signals generated by
information sources, in the characterization of communication channels used to
transmit the information, in the characterization of noise generated in a
receiver, and in the design of the optimum receiver for processing the received
random signal.
At any given time instant, the value of a stochastic process, whether it is the
value of the noise voltage generated by a resistor or the amplitude of the signal
generated by an audio source, is a random variable. Thus, we may view a
stochastic process as a random variable indexed by the parameter ¢. We shail
denote such a process by X(t). In general, the parameter ¢ is continuous,
whereas X may be either continuous or discrete, depending on the characteris-
tics of the source that generates the stochastic process.
The noise voltage generated by a single resistor or a single information
source represents a single realization of the stochastic process. Hence, it is
called a sample function of the stochastic process. The set of all possible sample
functions, e.g., the set of all noise voltage waveforms generated by resistors,
constitute an ensemble of sample functions or, equivalently, the stochastic
process X(i). In general, the number of sample functions in the ensemble is
assumed to be extremely large; often it is infinite.
Having defined a stochastic process X(t) as an ensemble of sample
functions, we may consider the values of the process at any set of time.instants
>t, >t;>...>1, where a is any positive integer. In general, the random
variables X, = X(t), i =1,2,...,n, are characterized statistically by their joint
Pdf p(x, x,,,---.%,,)- Furthermore, all the probabilistic relations defined in
Section 2-1 for multidimensional random variables carry over to the random
variables X,, i=1,2,...,m.
Stationary Stochastic Processes As indicated above, the random variables
X,, '=1,2,...,2, obtained from the stochastic process X(r1) for any set of
time instants 1, >) >t,>...>f, and any n are characterized Statistically by
the joint pdf p(x,,, x,, ,,). Let us consider another set of n random
variables X,,,= X(t,+ 9), i=1,2,...,n, where ¢ is an arbitrary time shift.
These random variables are characterized by the joint pdt
P(Xi40Xq+n+++>%:,+1). The joint pdfs of the random variables X, and X,4,,
f=1,2,...,n, may or may not be identical. When they are identical, ie..
when
P(Xiy Nigro Ly) = Pliny en kya os Xess) (2-2-1)
for all ¢ and all , the stochastic process is said to be stationary in the strict
sense. That is, the statistics ef a stationary stochastic process are invariant 10
any translation of the time axis. On the other hand, when the joint pdfs are
different, the stochastic process is nonsiationary.64 DIGITAL COMMUNICATIONS
2-2-1 Statistical Averages
Just as we have defined statistical averages for random variables, we may
similarly define statistical averages for a stochastic process. Such averages are
also called ensemble averages. Let X(t) denote a random process and let
X,, = X(t,). The nth moment of the random variable X;, is defined as
eap= [xtptidar, 2-22)
In general, the value of the nth moment will depend on the time instant 1; if the
pdf of X,, depends on 4. When the process is stationary, however, p(x,.,) =
p(x,) for all, Hence, the pdf is independent of time, and, as a consequence,
the nth moment is independent of time.
Next we consider the two random variables X,=X(s,), i= 1,2. The
correlation between X,, and X,, is measured by the joint moment
EX,X,)= | i Xi XP Krys Xr.) Ay, AX, (2-2-3)
Since this joint moment depends on the time instants 1, and 1;, it is denoted by
(t), 2). The function $(t,,%) is called the autocorrelation function of the
stochastic process. When the process X(t) is stationary, the joint pdf of the pair
(X,,. X,.) is identical to the joint pdf of the pair (X,,,,, X,,+1) for any arbitrary ¢.
This implies that the autocorrelation function of X(t) does not depend on the
specific time instants ¢, and 1, but, instead, it depends on the time difference
t ~ tj. Thus, for a stationary stochastic process, the joint moment in (2-2-3) is
EX, X..) = Oh, 62) = $C ~ 2) = 6(t) (2-2-4)
where 7 = ¢, — f or, equivalently, t. = t, — 7. If we let = t+ 7, we have
$(—1) = E(X,,X,,+) = E(XgXa-2) = O(t)
Therefore, ¢(r) is an even function. We also note that $(0) = E(X?) denotes
the average power in the process X(t).
There exist nonstationary processes with the property that the mean value
of the process is independent of time (a constant) and where the autocorrela-
tion function satisfies the condition that $(t, te) = (0, — &). Such a Process is
called wide-sense stationary. Consequently, wide-sense stationarity is a less
stringent condition than strict-sense stationarity. When reference is made to a
stationary stochastic process in any subsequent discussion in which correlation
functions are involved, the less stringent condition (wide-sense stationarity) is
implied.
Related to the autocorrelation function is the autocovariance function of a
stochastic process, which is defined as
A(t, &) = EA{X,, — m(t)]LX, — m(e)}}
= (6, b) - m(t)m() (2-2-5)