Pham MASc Thesis
Pham MASc Thesis
by
Jennifer Pham
Copyright
c by Jennifer Pham, 2007
Time-interleaved ∆Σ-DAC
for Broadband Wireless Applications
Jennifer Pham
Master of Applied Science, 2007
Graduate Department of Electrical and Computer Engineering
University of Toronto
Abstract
The analysis and design of a time-interleaved delta-sigma digital-to-analog converter
(TIM ∆Σ-DAC) is presented. The digital front-end of the TIM ∆Σ-DAC comprises a 95th -
order time-interleaved-by-8 FIR interpolation filter and a 3rd -order time-interleaved-by-8 ∆Σ
modulator. The time-interleaved architecture uses parallelism to support a low OSR of 8,
which results in a large effective bandwidth for broadband applications. The 4-bit output of
the ∆Σ modulator is converted into analog using 16 current-steering cells with continuous
current calibration. The chip was fabricated in 90nm CMOS. It was designed to operate
at 4GS/s with a bandwidth of 250MHz. The analog back-end was tested with modulated
data from a simulation of the digital front-end. It was measured at 2.66GS/s and achieved
a bandwidth of 166MHz, an SNR of 46dB and an SFDR of 56dB. At 2GS/s, the prototype
consumed 102mW from a 1V supply.
iii
iv
Acknowledgments
Throughout the course of my thesis work, I have encountered numerous obstacles, at
which there was always someone coming along with ingenuity, inspiration, and encourage-
ment. There are so many people I would like to thank.
First of all, I am truly grateful to my supervisor, Tony Chan Carusone, who has given me
continuous support and insight throughout this work. He gave me the freedom of research
and motivated me to explore the field where I was a complete stranger. He has been like a
friend who is always there to help and to listen.
I would also like to thank my colleagues for lending a hand whenever I got caught in
the midst of confusion. Without their support, I would have much trouble completing this
work. In particular, I would like to express my gratitude to Kentaro Yamamoto and Joseph
Aziz who assisted me in the CAD design and experimental testing; Tyler Brandon from
University of Alberta who patiently fixed countless DRC problems and guided me through
the maze of 90nm CMOS Place & Route; Ahmad Darabiha, Ian Kuon, and Zdravko Lukic
who supported me at different phases of the digital design flow; Keith Tang who provided me
with custom RF pads; Marcus van Ierssel, Oleksiy Tyshchenko, and Cintia Man who helped
me with the PCB design and digital test setup; and Jaro Pristupa who saved me in many
CAD tool panics. I also would like to thank my peers in BA5000 for the endless laughters
and priceless memories.
v
vi
Table of Contents
List of Tables xv
1 Introduction 1
1.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Oversampled ∆Σ vs. Nyquist-rate DAC . . . . . . . . . . . . . . . . 3
1.1.2 Lowpass vs. Bandpass ∆Σ-DAC . . . . . . . . . . . . . . . . . . . . . 4
1.1.3 Time-interleaved vs. Conventional ∆Σ-DAC . . . . . . . . . . . . . . 4
1.1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 State-of-the-Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Theoretical Background 13
2.1 System Architecture for ∆Σ-DAC . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 ∆Σ Modulator Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.1 Error-Feedback ∆Σ Modulator Architecture . . . . . . . . . . . . . . 16
2.2.2 Error-Feedback ∆Σ Modulator Stability Analysis . . . . . . . . . . . 16
2.3 Time-interleaved ∆Σ Modulator . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3.1 Polyphase Decomposition . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3.2 Block Digital Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.3 Time-interleaved Error-Feedback ∆Σ Modulator . . . . . . . . . . . . 23
2.3.4 Practical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . 25
vii
viii Table of Contents
6 Conclusions 101
6.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
References 120
x Table of Contents
List of Figures
xi
xii List of Figures
5.1 Die photos of the TIM ∆Σ-DAC chip fabricated in 90nm CMOS . . . . . . . 88
5.2 TIM ∆Σ-DAC prototype a) Packaging and b) Testboard . . . . . . . . . 88
5.3 Analog back-end test flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.4 Full test setup for Agilent 93K SOC or Agilent ParBert platform . . . . . . . 90
5.5 Experimental setup for analog back-end . . . . . . . . . . . . . . . . . . . . . 90
5.6 Current-steering DAC stair case transient response . . . . . . . . . . . . . . 92
5.7 Output spectrum with calibration feed-through . . . . . . . . . . . . . . . . 93
5.8 Clock Divider Transient Response . . . . . . . . . . . . . . . . . . . . . . . . 94
5.9 CS-DAC transient response for a single-tone, 0dBFS input amplitude (top -
single ended outputs; bottom - differential output) . . . . . . . . . . . . . . . 94
5.10 Noise shape and inband spectra for a single-tone, 0dBFS input amplitude at
0.13fB and 0.29fB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.11 CS-DAC accuracy performance with single-tone input and passive load . . . 96
5.12 Two-tone spectrum and SFDR measurements . . . . . . . . . . . . . . . . . 97
5.13 Multi-tone Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
B.2 TIM ∆Σ-DAC output spectrum with analog LPF for Matlab simulations with
0dBFS input amplitude at different input frequencies a) 0.13fB b) 0.25fB c)
0.50fB d) 0.93fB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
B.3 TIM ∆Σ-DAC response with an ideal vs. analog filter . . . . . . . . . . . . . 110
B.4 TIM-IF-DSM output spectrum with thermometer DAC element mismatches 111
xv
xvi List of Tables
List of Acronyms
xvii
xviii List of Acronyms
FO2 Fanout-of-2
FVT Filter Visualization Tool
IF Interpolation Filter
IIR Infinite Impulse Response
ILA Individual Level Averaging
IM2 Second-order Intermodulation
IM3 Third-order Intermodulation
IST Information Society Technologies
LPF Low-pass Filter
LSB Least Significant Bit
LTI Linear Time Invariant
LVS Layout Versus Schematic
MASH Multi-Stage Noise Shaping
MSB Most Significant Bit
MTPR Multi-tone Power Ratio
NTF Noise Transfer Function
OBG Out-of-band Gain
OFB Output-Feedback
OFDM Orthogonal Frequency Division Multiplexing
OSR Oversampling Ratio
ParBert Parallel Bit-Error-Rate
PCB Printed Circuit Board
PLL Phase Locked Loop
PVT Process, Voltage, and Temperature
RCA Ripple Carry Adder
RTL Register Transfer Level
SFDR Spurious-Free Dynamic Range
SISO Single-Input Single-Output
SNDR Signal to Noise plus Distortion Ratio
SNR Signal to Noise Ratio
List of Acronyms xix
In recent years, there exists a great competition for ultra-high data rate wireless com-
munication systems to meet the emergence of broadband multimedia applications. The
demand for wideband wireless personal area networking (WPAN) or wireless local area net-
work (WLAN), and point-to-point or point-to-multipoint data links continuously pushes the
capacity of wireless networks. Currently, the transfer capacity exceeds what can be accom-
modated in the widely used unlicensed bands (2.4GHz and 5.8GHz) for WLAN systems.
An alternative solution is to resort to higher bands where bandwidth is abundant. Namely,
in 2001, the Federal Communications Commission (FCC) set aside a continuous unlicensed
block of 7GHz of spectrum between 57 and 64 GHz for short-range indoor WPAN/WLAN
applications. A year later, the FCC opened up a licensed block 7.5GHz of spectrum between
3.1 GHz to 10.6 GHz for short-range indoor ultra wideband (UWB) applications [3].
1
2 Chapter 1. Introduction
1.1. Motivations
subsectionApplications There has been continuous research by the European IST (Infor-
mation Society Technologies) on system integration for 60GHz radios. They had proposed
a hybrid dual-frequency system called BROADWAY which is based on an integration of
HIPERLAN/2 (an existing 802.11a WLAN at 5GHz) and HIPERSPOT (an innovative fully
ad-hoc extension at 60GHz). As a result, the intermediate frequency is purposely taken at
5GHz as shown in figure 1.1. This integration will result in wider acceptance and lower cost
for both systems while providing a new solution for dense urban deployments [4].
While the final standards for 60GHz WPAN/WLAN and UWB are still under debate,
both systems are likely to employ multi-band OFDM (orthogonal frequency division multi-
plexing) schemes capable of transmitting data in the 500Mb/s range [4, 5, 6, 7]. The DAC
accuracy required for these systems ranges anywhere from 4-5 bits for an UWB transmitter
[6, 7] to 8-10 bits for a 60GHz transmitter [8]. Since the 60GHz transmitter demands more
rigorous requirements, this work will focus on designing a DAC for this application which
will certainly suffice for an UWB design.
1.1. Motivations 3
complexity and its error-correction circuitry grows exponentially by a factor of 2k for a k-bit
DAC. Typically, k is chosen to be between 4-6 bits for practical implementations [1]. Some
sophisticated designs for ∆Σ-DACs with over 6 bits such as dual-truncation or segmentation
can also yield high SNR performance results but do not alleviate the non-linearity errors.
Another way to increase the bandwidth of a ∆Σ-DAC is to parallelize the DSM into
multiple channels operating at lower speeds then combine their outputs. Parallel DSM can
be categorized into three groups based on different multiplexing schemes: frequency divi-
sion multiplexing (FDM), code division multiplexing (CDM), and time division multiplexing
(TDM).
Figure 1.2 shows the block diagram of a parallel system based on a FDM scheme. This
system contains a bank of parallel bandpass DSMs which have different band reject noise
transfer functions and operate on different frequency sub-bands [20]. A digital bandpass
filter attenuates the out-of-band noise in each channel and allows for recombination of the
frequency-decomposed input signal [21]. This system has a high level of design and hardware
complexity due to the requirement of many bandpass DSMs and bandpass filters, each with
different center frequencies.
Figure 1.3 shows the block diagram of a parallel system based on a CDM scheme, which
is also known as a Π − ∆Σ modulator. In [22], a Hadamard transformation is used to
decompose the input into multiple spread spectrum channels by modulating it with different
6 Chapter 1. Introduction
Lastly, figure 1.4 shows the block diagram of a parallel system based on a TDM scheme.
Among the three different structures, this is the simplest form of parallelism. Here, the
input is demultiplexed to M channels in which each operates at (1/M ) of the input sampling
frequency. The channels are then recombined through unit delays and a multiplexer. How-
ever, this brute-force approach often results in a small SNR improvement. Specifically, there
is only a 3dB-improvement in SNR for each doubling of the number of parallel modulators
regardless of their order [2].
Alternatively, a novel approach to modify the TDM scheme proposed by Khoini-Poorfard
et al. has significantly improved the SNR while meeting the low OSR requirement. In [2, 23],
a block digital filtering technique was used to successfully transform a conventional DSM
into a time-interleaved DSM with interconnecting channels. By having M interconnected
channels running in parallel as shown in figure 1.5, the total effective sampling rate becomes
M times the sampling rate of each channel. The improvement in SNR is 6(n + 1/2)dB for
1.1. Motivations 7
each doubling of the number of nth order modulators [2]. Furthermore, the preceding in-
terpolation filter can be time-interleaved based on a polyphase decomposition without an
increase in hardware complexity. Hence, this motivates the choice of time-interleaved DSM
architecture for this work. Further details on the time-interleaved ∆Σ-DAC (TIM ∆Σ-DAC)
will be discussed later in chapter 2. To distinguish it from the conventional TDM scheme,
this modified TDM scheme will be referred as TIM from this point onward.
1.1.4. Summary
In summary, the motivations for this work are:
• To push ∆Σ-DAC designs to higher speeds to meet the demands of broadband wireless
applications and to accommodate the design challenges of deep-sub micron processes.
• To employ a lowpass ∆Σ-DAC design due to its reasonable level of complexity and
feasibility.
Table 1.1 summarizes the design targets of this work. Due to the speed and system
integration requirements, the technology is chosen to be STMicroelectronics 90nm CMOS,
1V supply process.
1.2. State-of-the-Art
As mentioned earlier, some published Nyquist DACs using CMOS technology can meet the
required specifications shown in table 1.1. For example, a recently published Nyquist DAC in
[11] has a measured resolution of at least 8 bits up to a bandwidth of 193MHz at a sampling
rate of 800MS/s while consuming 49mW. In [12] and [14], the bandwidth is up to 250MHz
for a sampling rate of 500MS/s and a resolution of 12 and 10 bits while consuming 216mW
and 125mW, respectively. Another impressive design in 0.35µm CMOS [13] goes beyond
the required specifications with a bandwidth of 500MHz for a sampling rate of 1Gs/s and
10-bit resolution while consuming 110mW. For a conventional (i.e.: without parallelism)
lowpass ∆Σ-DAC fabricated in CMOS technology, the most relevant work in 0.5µm could
only achieve 5MHz bandwidth [24].
The idea of TDM parallelism has been introduced for many years but is still uncommon
in DAC applications. For instance, the DAC in [18] employs a heterodyne technique to
commutate the output of the polyphase interpolation filter into multiple parallel paths.
Each path has its own DSM to perform the modulation which is then time-interleaved in
the digital domain before feeding into the single DAC. While time-interleaving, the parallel
spectra are aligned such that the signal-band experiences coherent gain while the noise-band
experiences destructive cancellation. Although only simulation results were presented in [18],
it promises a solution for a wideband ∆Σ-DAC, as well as the possibility of a quadrature
parallel ∆Σ-DAC.
Similar to the case of conventional ∆Σ modulation, much research effort has been focused
on TIM ∆Σ-ADCs while TIM ∆Σ-DACs have been largely overlooked. In [26], a 2nd -order
10 Chapter 1. Introduction
Figure 2.1 illustrates the basic architecture of a ∆Σ-DAC. The digital front-end contains
the interpolation filter (IF) and the DSM, while the analog back-end contains the multi-bit
DAC and the analog reconstruction filter. Figure 2.2 shows the signal spectrum at each
internal node of the ∆Σ-DAC. The input signal, x, is a N-bit digital stream sampled at the
Nyquist rate fN .
The IF serves two purposes: to raise the input frequency (fN ) by an oversampling ratio
13
14 Chapter 2. Theoretical Background
(OSR) and to suppress all unwanted replicas of the signal between baseband and sampling
frequency (i.e. :fS = OSR · fN ), which arise due to the upsampling. The out-of-band
attenuation of the IF improves the dynamic range of the DSM since larger signals can be
accommodated. In addition, it reduces the attenuation requirements on the analog filter
since only out-of-band truncation noise needs to be suppressed. Finally, the amount of
intermodulated out-of-band noise that can fold back into the signal band is reduced, thus
relaxing the analog filter linearity requirements.
The DSM truncates the word-length of its signal to k bits where k < N . The modulator
output contains the input signal, as well as the filtered truncation noise caused by the reduced
2.2. ∆Σ Modulator Architectures 15
word-length. Similar to an analog DSM, an ideal 1-bit modulator would yield an inherently
linear DAC. However, it may cause loop instability, as well as make the analog filter’s design a
challenging task due to the high-frequency content of the high slew-rate output. In contrast,
multi-bit modulators improve both loop stability and noise shaping capability by allowing a
higher order noise transfer function. Also, they contain less out-of-band noise and lower slew-
rate requirements which significantly reduce the complexity of the analog filter. However,
additional circuitry is required to correct for the nonlinearity of a multi-bit DAC. Overall,
the performance and design benefits outweigh the additional hardware, thus favouring the
multi-bit structure in most ∆Σ-DAC, eg: [24, 28, 29, 30, 31].
Ideally, the DAC should produce an analog signal at its output without any distortion.
Thus, its output spectrum should be identical to that at the output of the DSM. Finally, the
analog reconstruction low-pass-filter should suppress most of the out-of-band noise, leaving
only the signal spectrum within the band of interest.
In the EFB structure, instead of feeding back the MSBs of the output V(z), the discarded
LSBs (i.e.: the truncation error E(z)), are fed back to the input. The digital loop filter, H(z),
is now located in the feedback path rather than in the forward path as in the conventional
DSM. Linear analysis shows that the transfer function for an EFB-DSM is:
N T F (z) = (1 − z −1 )n = 1 − H(z)
the dominant errors in a digital DSM are due to coefficient truncation and round-off errors
of the digital operations [1]. Similar to an analog DSM, these can affect the modulator’s
noise shaping capability. Further details on this topic will be discussed later in chapter 4.
Higher order (i.e.: 3rd -order and higher) EFB-DSM is often chosen to achieve higher
in-band noise shaping which directly corresponds to higher ENOB. However, a high order
EFB-DSM is prone to suffer from instability when the input to the truncator (i.e: Y(z) in
figure 2.3) grows beyond the operating range of the digital number representation.
For signed or unsigned arithmetic, an overflow causes Y(z) to saturate to its largest pos-
sible value. However, for 2’s complement, overflows cause Y(z) to wrap around, implying the
output V(z) suddenly decreases with increasing Y(z). While saturation is usually acceptable,
wrap-around causes large errors and must be prevented [1]. Since 2’s complement arithmetic
operations are generally advantageous, it is critical to resolve the overflow wrap-round prob-
lem. By adding a digital limiter before the truncator [32] as shown in figure 2.4, Y(z) will
saturate before an overflow can occur.
In addition to the external limiter, certain conditions must be imposed on the truncator
input to improve the modulator’s robustness and stability. Much research has been focused
on improving DSM stability, yet there is not a solid theoretical explanation to predict this
behaviour for high-order DSMs. A conservative empirical rule from Lee’s criterion, which
only applies for single-bit modulators, requires the NTF’s out-of-band gain (OBG) to be
less than 1.5 (i.e.: max|N T F (ejw )| <1.5) [33]. For a multi-bit modulator, a stability con-
dition proposed by Richard Schreier [1] determines how many input truncation levels are
needed to keep the DSM stable. The condition states that for any input less than half of the
quantizer input range, A, the modulator is guaranteed not to experience overloading (i.e.:
18 Chapter 2. Theoretical Background
max|u(n)| <A/2 + 2). While these conditions ensure stability of the DSMs, they dramat-
ically reduce the input dynamic range (DR) and thus, limit the achievable performance of
higher order modulators.
A bit-wise stability analysis on EFB-DSM (figure 2.5) in [33] allows higher dynamic range,
as well as higher out-of-band gain. In the EFB architecture, since H(z) is an FIR transfer
function, there is no need for an accumulator, unlike conventional output-feedback (OFB)
topologies. Hence, the word-length at all internal nodes can be predicted without complex
numerical analysis.
Let U(z) be a digital input stream of word-length N and T be a k-bit truncator. The
input summer adds at most 1 bit to give (N + 1) bits to the truncator. Here, the truncation
is done by simply splitting k MSB bits to V(z) and feeding back (N + 1 − k) LSB bits to the
loop filter H(z). Also, assume that the number of additional bits due to H(z) (i.e.: nH(z) ) is
the same as its order. This is a reasonable assumption since the number of taps H(z) is the
same as its order. Hence, in order to keep all internal signals bounded and as long as H(z)
is an FIR filter, the number of bits at the output of H(z) can only be at most N and thus:
(N + 1 − k) + nH(z) = N
⇒ nH(z) = k − 1 (2.3)
Equation 2.3 implies that in order to have all internal signals bounded, the order of H(z)
must be 1 less than the number of truncating bits. In other words, the stability criterion is:
An error-feedback modulator with an k-bit truncator and a loop filter of
order (k-1) is stable. [33]
2.2. ∆Σ Modulator Architectures 19
The simulation results of an EFB system in [33] based on the above criterion show a
superior performance in both stability and signal-to-noise ratio over a conventional OFB
system. The k-bit EFB system can tolerate a full-scale out-of-band gain (OBG) of 2k−1
while the equivalent OFB system can only tolerate OBG up to approximately 3.5.
The combination of both a limiter and a stability-criterion-based EFB modulator design
shown in figure 2.6 ensures that the modulator is stable and robust.
Essentially, equation 2.4 groups the impulse-response coefficients h(n) into even samples,
e0 (n) = h(2n) and odd samples, e1 (n) = h(2n + 1). If the z-transforms of e0 and e1 are E0 (z)
and E1 (z), respectively then:
P∞ P∞
E0 (z) = n=−∞ h(2n)z −n and E1 (z) = n=−∞ h(2n + 1)z −n
H(z) = E0 (z 2 ) + z −1 E1 (z 2 ) (2.5)
The quantities E0 (z) and E1 (z) are the polyphase components of H(z) and the representa-
tion in 2.5 is called the two-component polyphase decomposition of H(z). This decomposition
2.3. Time-interleaved ∆Σ Modulator 21
is valid for the case when H(z) is either a FIR or IIR filter. Also, it is possible to extend
H(z) to an M-component polyphase decomposition in the form:
M
X −1
H(z) = z −k Ek (z M ) (2.6)
k=0
Basically, the impulse-response coefficients h(n) have been divided into M groups and
Ek (z) are simply the M-fold decimated sequences of H(z). Here, H(z) is called a Type
1 polyphase decomposition and Ek (z) are called a Type 1 polyphase components. Type 2
polyphase decomposition is similar to Type 1, except that the components are renumbered:
M
X −1
H(z) = z −(M −1−k) Rk (z M ) where Rk (z) = EM −1−k (z) (2.8)
k=0
Type 1 and Type 2 decompositions are well suited for the design of decimation and
interpolation filters, respectively. Their implementations can be found in full detail in [34].
where uk (n) = u(nM + k) and vk (n) = v(nM + k) for (0 ≤ k ≤ M − 1). Equations 2.9 and
2.10 closely resemble the polyphase decomposition form in (2.7). In fact, their components
are M-fold decimated versions of u(n) and v(n).
Hence, the z-transform of the two vector-sequences are related by an MxM transfer matrix
H(z), i.e.:
V (z) = H(z)U (z) (2.11)
22 Chapter 2. Theoretical Background
Figure 2.7: (a) Scalar transfer function, (b) Time-interleaved-by-M version [2]
The MxM matrix H(z) is a blocked version of H(z) and its implementation is called block
digital filtering. Figure 2.7(b) depicts a time-interleaved-by-M version of a scalar system.
H 11 H 12 H 13 . . . H 1M
H 21 H 22 H 23 . . . H 2M
= H 31 H 32 H 33 . . .
H 3M
.. .. .. ..
. . . .
HM1 HM2 HM3 . . . HMM
2.3. Time-interleaved ∆Σ Modulator 23
The elements in the first row of H(z) matrix are indeed the Type 1 polyphase components
of H(z) as defined in (2.6). Each element H ij corresponds to the contribution of the j th
input to the ith output. For example, H 12 would correspond to the contribution of the input
of path 2 to the output of path 1.
In the matrix H(z), each row is a circularly shifted version of the row above it except
for the elements below the diagonal entries which also contain a delay. This type of matrix
is called pseudo-circulant, which is a necessary condition for the block digital filter H(z) to
represent a SISO linear time-invariant transfer function H(z) [23].
Since H ij corresponds to the contribution of the j th input to the ith output, the architec-
ture of a TIM-DSM can be realized as depicted in figure 2.9. Compared to the equivalent
OFB modulator in [2], the EFB structure has a lower level of circuit complexity and requires
24 Chapter 2. Theoretical Background
much less hardware. These savings would become even more significant for a higher order
modulator and for a higher time-interleaving factor, M.
In summary, the steps to realize the architecture of any TIM-DSM are as follows:
2. Perform M-component polyphase decomposition of H(z) using equation (2.6) and (2.7)
3. Substitute Ek (z) into equation (2.12) to find the MxM matrix H(z)
For a conventional IF followed by a TIM-DSM, the input is first upsampled by the inter-
polator then downsampled by the DSM’s input demultiplexer. By applying the same digital
block filtering technique on G(z), the IF can also be transformed into a time-interleaved IF
(TIM-IF). If the time-interleaving factor for the IF is same as that of the DSM (i.e.: M),
no multiplexer/demultiplexer is needed between them. Furthermore, if M is chosen to be
the same as OSR, the upsampler preceding the IF is also eliminated. Figure 2.11 shows the
TIM-IF integrated together with the TIM-DSM. Here, all sub-blocks operate at a sampling
rate of fN · ( OSR
M
), except the output multiplexer which operates at a sampling rate of fS
(i.e.: fN · OSR).
Note that the upsampler preceding G(z) in a conventional IF ensures that only one of
every M inputs is nonzero. This simplifies the G(z) of a TIM-IF from an MxM matrix down
to an Mx1 matrix. In other words, this reduces the required number of elements of G(z)
from M 2 down to M . Thus, only the first column of G(z) is implemented, resulting in a
TIM-IF having the same complexity as a conventional IF but operating at (1/M )th the rate
[2]. It should also be noted that unlike the case of a TIM-DSM where the internal paths are
interconnected, the paths of TIM-IF are independent of each other. Using the same steps to
2.5. Summary 27
realize a TIM structure from the previous section, an Mx1 matrix for a TIM-IF is given by:
I0 (z)
−1
z IM −1 (z)
G(z) = z −1 IM −2 (z) (2.14)
..
.
−1
z I1 (z)
2.5. Summary
In general, this chapter gave a brief overview of ∆Σ-DAC architectures. Particularly, the TIM
∆Σ-DAC is of great interest since it combines the well-known benefits of a ∆Σ modulator
and the potential for wider bandwidth of a parallel structure. The IF can also be time-
interleaved to simplify the overall integrated TIM ∆Σ-DAC design while resulting in no
additional hardware complexity.
28 Chapter 2. Theoretical Background
Chapter 3
Time-interleaved ∆Σ-DAC Design
This chapter discusses the architectural design of a time-interleaved (TIM) ∆Σ-DAC. The
digital front-end of a TIM ∆Σ-DAC contains a time-interleaved interpolation filter (TIM-IF)
and a time-interleaved ∆Σ modulator (TIM-DSM). The analog back-end of a TIM ∆Σ-DAC
contains a DAC and an analog reconstruction filter. Specific details of these sub-blocks are
discussed in the order of which they appear in the system.
From table 1.1, since the design targets for an ENOB around 9 bits, this corresponds to
an accuracy (SNR) and linearity (SFDR) performance of approximately 56 dB. Note that
SNR is the ratio of the fundamental signal power to the inband noise power, but it does not
account for harmonic distortion. The parameter that accounts for both noise and distortion
is called the SNDR (Signal to Noise plus Distortion Ratio), which is often less than that of
SNR. The design targets in table 1.1 are conservative but the top-level design aims for even
29
30 Chapter 3. Time-interleaved ∆Σ-DAC Design
types of windowing (e.g.: Kaiser, Blackman-Harris, Hamming, Gaussian, etc) which have
different trade-offs depending on the design. A long polyphase FIR length gives a high cutoff
frequency (i.e.: wide bandwidth) and high attenuation but also has a high implementation
complexity. In this application, Kaiser windowing (with α=0.5 by default) gives the optimal
trade-offs in terms of bandwidth, attenuation and complexity.
To design an FIR interpolation filter (IF), Kaiser windowing requires a polyphase length
(pl) and a stopband attenuation (αs , in dB). The IF cutoff becomes sharper with higher pl
to a point when pl is large enough such that increasing it further only results in a small
improvement. Note that only when pl → ∞ does the IF become ideal. In addition, large
pl results in an impractical implementation due to the large number of coefficients (i.e.:
filter polyphase terms or filter taps). Through simulations, pl is chosen to be 96, which
corresponds to a 95th order FIR filter. On the other hand, increasing αs gives higher out-of-
band attenuation hence reducing the analog filter’s attenuation requirement. However, this
significantly reduces the IF roll-off rate. Since large out-of-band truncation noise is added by
the DSM after the IF, having a large αs does not give a significant benefit. Thus, αs = 40dB
is found to be sufficient for this design.
Figure 3.2 shows different responses of the IF with and without quantization of the 96 co-
efficients. In simulations, the IF coefficients are first obtained from the “windowed” impulse
response with full-precision. In an actual implementation, these coefficients are rounded-
off due to the fixed-length multipliers. For a large digital system, multipliers occupy large
area and slow down the operating speed. However, if the coefficients are quantized using
canonic sign digit (CSD) representation where they are represented as sums or differences
of power-of-2, only adders and subtractors are required. This eliminates the need for digital
multipliers and ultimately results in an effective and robust digital implementation as long
as the discrepancies are reasonably acceptable. In this work, the quantization algorithm
allocates one and two CSD terms for coefficients with magnitude < |0.1| and > |0.1|, respec-
tively. Indepth details on the multiplierless IF hardware implementation will be discussed
in chapter 4.
In these responses, there is little discrepancy between the ideal and quantized IF in the
passband. Outside the passband, especially after 2 · fB , the attenuation of the quantized
32 Chapter 3. Time-interleaved ∆Σ-DAC Design
Figure 3.2: A 95th -order FIR interpolation filter with and without coefficient quantization
IF degrades significantly. However, this is acceptable since within the critical band (fB to
2 · fB ), the attenuation is still around 40dB as intended. After this band, the truncation
noise becomes dominant hence a reduction in stopband attenuation does not cause too much
damage. Nevertheless, the quantized IF attenuates all images by at least 28dB over the
entire stopband which still helps relaxing the analog filter’s attenuation requirement.
From figure 3.2(b), the -3dB bandwidth is around 235MHz. The passband ripple is
approximately 0.2dB, which is quite acceptable. Table 3.1 summarizes the IF design method
as well as its performance.
Window Kaiser
Design Polyphase length (pl) 96
Filter order (l) 95
Stopband attenuation (αs ) 40 dB
Performance -3dB Bandwidth (BW−3dB ) 235 MHz
(Quantized IF) Passband ripple 0.2 dB
Stopband attenuation ≥ 28 dB
3.2. Time-interleaved Interpolation Filter 33
Thus, the IF is a 95th order FIR filter, G(z), of the following form:
l=95
X
G(z) = z −k g(k) = g(0) + g(1)z −1 + g(2)z −2 + · · · + g(n)z −l (3.1)
k=0
That is:
Based on Ik (z) and equation 2.14 from section 2.4, G(z) of a TIM-IF is given by:
I0 (z)
−1
z I7 (z)
G(z) = z −1 I6 (z) (3.4)
..
.
z −1 I1 (z)
Figure 3.2 shows the realization of a TIM-IF based on the conventional IF for this work.
Notice that aside from the first path (U1 ), all subsequent paths (U2 − U8 ) are in reverse order
of the polyphase components (I7 (z) − I1 (z)).
34 Chapter 3. Time-interleaved ∆Σ-DAC Design
(a) Conventional IF
(b) Time-interleaved-by-8 IF
As mentioned in section 2.2.1, the main parameters that control the SNR performance are:
the OSR, modulator order (m) and number of truncator bits (k ). Based on these parameters,
the maximum SNR can be estimated according to the following calculations.
Let the input signal be a sinusoidal wave. Its full-swing amplitude, A, is defined as
3.3. Time-interleaved ∆Σ Modulator 35
2k (∆/2) where ∆ is the unit quantization step size (or 1 LSB - least significant bit) and k is
the number of truncator bits. Hence, the signal power, Ps , is given by ([37], Ch.14):
2
A2 2k ∆ 22k ∆2
Ps = = √ = (3.5)
2 2 2 8
3(2m + 1)22k−1
Ps 2m+1
SN Rmax = 10log = 10log (OSR) (3.7)
Pe π 2m
According to table 1.1, the OSR is 8. This leaves only m and k to be determined. From
the stability analysis of an EFB-DSM in section 2.2, the modulator order should be at least
1 less than the number of truncator bits (i.e.: m ≤ k − 1). Based on equation 3.7, choosing
m=3 and k=4 results in an SNR of 68dB which allows sufficient design margin beyond the
target of 56dB.
N T F (z)conv = (1 − z −1 )3 (3.8)
in which, all zeros and poles are located at z=1 and z=0, respectively.
According to ([1], Ch. 4), significant improvement in SNR can be obtained by optimizing
the NTF zero locations. By spreading the zeros along the z-domain unit circle, the total
inband noise power can be reduced. The optimal NTF zeros can be found by equating the
partial derivatives of the noise power to zero. The mathematical derivations are not discussed
here and the optimization is done using a built-in function in Richard Schreier’s Delta-Sigma
Toolbox [38].
Although, moving the poles closer to the zeros reduces the out-of-band (OBG) gain results
in improved stability, this was not done here. As discussed in section 2.2, using stability
36 Chapter 3. Time-interleaved ∆Σ-DAC Design
criterion in [33], the stability of an EFB system can be maintained while tolerating much
higher OBG than an OFB system. Thus, there was no need for NTF pole optimization.
Figure 3.4(a) shows the pole-zero plot of the optimized NTF for a 3rd -order DSM. Opti-
q √
3/5·f
mizing the NTF zeros results in a notch at DC and another one at 35 · fB or 2·OSRS ([1],
Ch. 4). This improves the SNR by 8dB compared to the case where all zeros are at DC.
Similar to the TIM-IF, the NTF coefficients must be quantized using CSD representation
for digital realization. Since there are only 3 taps for a 3rd -order DSM, large discrepancies
between quantized and non-quantized coefficients degrade the SNR performance. Hence, the
quantized NTF coefficients should be close to the optimized NTF value by utilizing more
CSD terms. Here, they are represented by 3 CSD terms as given below.
Figure 3.4(b) overlays the response of all NTF versions. The quantization results in a
slight degradation of inband noise shaping and a shift in notch location closer to the band
edge. However, these have a small impact on the SNR performance which will be quantified
in a later section.
Figure 3.5 shows the Matlab model of a 3rd -order ∆Σ-DAC with conventional IF and
conventional DSM. Here, the 10-bit quantizer at the input generates a 10-bit digital stream
while the 4-bit quantizer near the DAC represents the 4-bit truncator with digital limiter.
3.3. Time-interleaved ∆Σ Modulator 37
Using the steps from section 2.3, a conventional DSM can be transformed into a TIM-DSM.
Similar to the TIM-IF, the 8-component polyphase decomposition of H(z) from 3.11 is:
7
X
H(z) = z −k Ek (z 8 ) = E0 (z 8 ) + z −1 E1 (z 8 ) + · · · + z −7 E7 (z 8 ) (3.12)
k=0
38 Chapter 3. Time-interleaved ∆Σ-DAC Design
E0 (z) = 0 E4 (z) = 0
E1 (z) = a E5 (z) = 0
E2 (z) = −a E6 (z) = 0
E3 (z) = 1 E7 (z) = 0
(3.13)
Next, substitute the above polyphase components Ek (z) into equation (2.12) to get:
E0 (z) E1 (z) E2 (z) . . . E7 (z)
−1
z E7 (z) E0 (z) E1 (z) . . . E6 (z)
H(z) = z −1 E6 (z) z −1 E7 (z) (3.14)
E0 (z) . . . E5 (z)
.. .. .. ..
. . . .
z −1 E1 (z) z −1 E2 (z) z −1 E3 (z) . . . E0 (z)
0 a −a 1 0 0 0 0
0 0 a −a 1 0 0 0
0 0 0 a −a 1 0 0
a −a
0 0 0 0 1 0
=
a −a
0 0 0 0 0 1
z −1 a −a
0 0 0 0 0
−az
−1 −1
z 0 0 0 0 0 a
−1 −1 −1
az −az z 0 0 0 0 0
Lastly, using the relation H ij which corresponds to the contribution of the j th input to
the ith output, the architecture of a TIM-DSM for this work can be realized as depicted in
figure 3.6.
3.3. Time-interleaved ∆Σ Modulator 39
(a) SNR vs. Input amplitude (b) SNR vs. Input frequency
Figure 3.8 shows the TIM-IF-DSM output SNR and SNDR versus input amplitude
for a single tone at 0.25fB for non-optimized (N T F ), optimized (N T Fopt ) and quantized
3.3. Time-interleaved ∆Σ Modulator 41
(N T Fquan ) optimized NTF. In these simulations, the input is quantized to 10 bits, but in-
ternal computations are performed with full precision even when the TIM-IF and TIM-DSM
coefficients are quantized. As discussed earlier, this figure shows an SNR improvement of
8dB between N T F and N T Fopt . Compared to the N T Fopt , the N T Fquan shows a 2dB in
SNR degradation but less than 1dB in SNDR degradation. This implies the N T Fquan is an
acceptable design.
(a) SNR vs. Input amplitude (b) SNDR vs. Input amplitude
Figure 3.8: TIM-IF-DSM performance for a single tone at 0.25fB for non-optimized, opti-
mized and quantized optimized NTF (Matlab simulations)
Figure 3.9 shows the TIM-IF-DSM output spectrum for different input frequencies, rang-
ing from 0.13fB to 0.93fB . For input frequencies below 0.33fB , the odd harmonics, caused
by truncation error, show up as inband tones while above this frequency, the odd harmonics
are out of band. Although the harmonics are less of a concern for high-frequency inputs, the
output amplitude is attenuated due to the band limitation of both practical digital IF and
analog filter. In general, while the low-frequency degradation is dominated by the inband
harmonics, the high-frequency degradation is dominated by the IF filter bandwidth.
Figure 3.10(a) shows the TIM-IF-DSM performance versus input amplitude for an input
tone at 0.25fB . The SNDR degrades by approximately 4dB with respect to that of SNR
strictly due to inband harmonics. On the other hand, figure 3.10(b) shows the TIM-IF-DSM
42 Chapter 3. Time-interleaved ∆Σ-DAC Design
(a) (b)
(c) (d)
Figure 3.9: TIM-IF-DSM output spectrum for Matlab simulations with 0dBFS input ampli-
tude at different frequencies a) 0.13fB b) 0.25fB c) 0.50fB d) 0.93fB
3.4. Digital-to-Analog Converter Model 43
(a) SNR and SNDR vs. Input amplitude (b) SNR and SNDR vs. Input frequency
performance versus input frequency for an input amplitude of 0dBFS. It shows that the
SNDR degradation compared to SNR is only prominent for input frequencies below 0.33fB
(where the 3rd harmonic falls inband). For higher frequencies, the SNDR is identical to
SNR which remains around 60dB up to 0.8fB ; after which, it starts to degrade due to the
dominance of the TIM-IF’s frequency response (in figure 3.2(b)). Also, figure 3.10(b) shows a
performance of at least 8.8 bits for the entire input frequency band, which is quite acceptable
for the targeted applications of this work.
3.6. Summary
In summary, this chapter presents the design details of each sub-block in a TIM ∆Σ-DAC.
Figure 3.11 shows a complete Matlab model of this DAC. Note that the parallel paths must
be multiplexed in reverse order to generate a correct output. Architectural-level simulation
results are presented together with design trade offs and decisions. All sub-blocks, except
for the analog filter (TIM-IF, TIM-DSM, MUX, and CS-DAC) will be integrated in STMi-
croelectronics 90nm CMOS process.
46 Chapter 3. Time-interleaved ∆Σ-DAC Design
This chapter discusses the physical implementation of a TIM ∆Σ-DAC fabricated using
STMicroelectronics 90nm CMOS process. It consists of 3 parts as depicted in figure 4.1(b): a
digital baseband front-end, a high-speed digital interface, and a high-speed analog back-end.
Unlike the conventional ∆Σ-DAC in figure 4.1(a) which operates entirely at fS = OSR · fN ,
only the interface and analog section of the TIM ∆Σ-DAC operate at this speed, while the
main digital portion operates at fN .
47
48 Chapter 4. Time-interleaved ∆Σ-DAC Implementation
Multiplierless Implementation
The multiplierless technique makes hardware complexity proportional to the number of non-
zero bits (i.e.: logic 1’s) in the filter coefficients. For a further optimization, a canonic sign
4.1. Digital Baseband Front-End 49
digit (CSD) representation can be used where the constant coefficients are represented using
the fewest possible number of non-zero bits. It is a signed power-of-2 representation, in which
each bit is in the set 0, 1, 1 (where 1 = −1) [41]. Here, the coefficients are represented as
sums or differences of the fewest possible power-of-2 terms.
For the above example, by converting B = 0.11102 to B = 1.00102 , A × B can be
implemented using only 1 shifter and 1 adder as: A − (A >> 3).
Compared to a binary representation, CSD results in further hardware reduction due to
a fewer number of shifters and adders required.
Since the input to the TIM ∆Σ-DAC already has a fixed word-length (10 bits), the input
quantization error is not applicable in this work. The remaining two sources of errors will
be considered in this section.
In this work, one of the major challenges is the physical implementation of the 95th -order IF.
Due to its high order, coefficient quantization can have a significant effect on its stopband
attenuation. Fortunately, since the IF in this work is integrated with a “noise-shaping”
DSM, some degradation in the IF’s stopband attenuation can be tolerated. The out-of-band
noise will be dominated by a large amount of shaped truncation noise introduced after the
TIM-DSM. Hence, the IF implementation is acceptable as long as it preserves the passband
response while providing a reasonable amount of attenuation in the stopband, as shown
previously in figure 3.2.
50 Chapter 4. Time-interleaved ∆Σ-DAC Implementation
• One nonzero digit in the CSD code is typically required for each 20dB of stopband
attenuation in the filter specifications.
Recall from the IF design in chapter 3, the filter’s stopband attenuation was 40dB. Thus,
two nonzero CSD digits are generally used to represent each filter coefficient.
Roundoff errors are inevitable in fixed-length digital operations. There has been much re-
search to reduce these deterministic errors. In [44], an adaptive carry generation circuitry,
based on an exhaustive simulation or statistical analysis, is used to approximate the roundoff
errors being compensated. Inspired by this idea of carry compensation, the rounding scheme
in this work uses both an exact and an approximate carry as shown in figure 4.2.
Specifically, to obtain a y − bit output from the sum of x − bit inputs (where x > y), all
computations are done using (y + 1) bits, where the extra bit represents the exact carry. To
account for the truncated (x-y-1) bits, the MSB of this portion is added to the (y + 1)-bit
sum; this MSB represents the approximate carry. Finally, the (y + 1)-bit sum is truncated to
y − bit output at the last stage. In the example below, three 8-bit numbers are to be added
then truncated to form a 4-bit sum.
4.1. Digital Baseband Front-End 51
The correct result is approximately 10.8 in a decimal representation, where the 4-bit
truncation is included by multiplying by 2−4 . The first truncation method, which does not
include any rounding, results in a largest error (∆=1.8). The second truncation method,
which includes a 1-bit approximate carry, results in a nominal error (∆=0.8). Lastly, the
proposed truncation method, which includes a 1-bit exact and 1-bit approximate carry,
results in a smallest error (∆=0.2).
Note that with more exact-carry bits, even higher accuracy can be achieved. However,
this would degrade the speed and increase the area for a small improvement in accuracy. For
this design, a 1-bit exact carry and 1-bit approximate carry give a good trade-off between
these design considerations.
As mentioned in chapter 2, pipelined or parallel adders are required in this design to minimize
the critical path delay. Many different adder architectures are possible; the best choice
depends on the specific design. For example, a ripple carry adder (RCA) has the smallest
52 Chapter 4. Time-interleaved ∆Σ-DAC Implementation
area and lowest power but also slowest speed. On the other hand, a carry look-ahead adder
(CLA) has the fastest speed but its power consumption is relatively high. A carry select
adder (CSA) is a compromise between the high-speed operation of the CLAs and the low-
power consumption of the RCAs ([45], Ch. 7). Thus, the CSA architecture is used in this
work.
Figure 4.3 shows the architecture of a CSA, which consists of two full adders (FAs) for
each bit’s addition: one FA assumes the carry-in (Cin ) is ’1’ while the other assumes the Cin
is ‘0’. The FAs are grouped into “stages”, each of which is a RCA. At each stage, the Cin is
obtained from the previous stage, except for the first stage where Cin is an input. This Cin
selects one of the two sums, and one of the two carries, through simple 2-to-1 multiplexers.
The critical path, and hence the maximum operating speed of the CSA depends to a great
extent on the number of bits allocated to each stage. For example, a staging of (4-4-4-4-4-4-
4-4) for a 32-bit adder does not result in the maximum speed due to the multiplexing delay
of the carry path([45], Ch. 7). The optimal CSA staging depends on the specific technol-
ogy and adder word-length. Table 4.1 shows the CSA staging that results in the shortest
critical path for different CSA lengths. It also shows the estimated and synthesized delay
using 90nm CMOS standard-Vt digital libraries (CORE90GPSVT and CORX90GPSVT). A
4.1. Digital Baseband Front-End 53
combination of RCAs for low-bit adders (4-5 bits) and CSAs for medium to high-bit adders
can be used for further speed optimization. However, for simplicity, CSAs are used for all
bit adders. The CSA delay can be estimated as:
For example, for a 8-bit CSA, which has 1-1-1-2-3 staging, tCSA = 4(tM U X ) + 3(tF A ).
for example, sum tree 8 actually belongs to path 2 and so on. For an N-bit input, the word-
length at the output of the TIM-IF should be (N+1) bits to account for digital arithmetic
overflow. This overflow bit is also shared with the TIM-DSM. Thus, it is not necessary to
provide another overflow bit for the TIM-DSM which never overflow through the use of a
digital limiter. In this work, the input and output are 10 and 11 bits, respectively.
For each path, there are 12 coefficients with each being represented by 1 to 3 CSD terms.
Thus, there are many CSD terms and summing operations involved for each path. This
makes the fixed word-length output (11 bits) a challenging task. To meet the fixed word-
length requirement and also to minimize the delay, a custom“sum tree” is created for each
TIM-IF path. Notice that the coefficients for paths 2 & 8 are the same but in reverse order,
thus the same sum tree design can be used. The same applies to paths 3 & 7 and paths 4 &
6. Figure 4.5 shows an example of a sum tree for path 2 & 8; the sum trees for the remaining
paths can be found in appendix C.
All sum trees use the CSA described in section 4.1.3. Binary sign extension is used
4.1. Digital Baseband Front-End 55
whenever needed to ensure that both inputs to each CSA have the same word-length. The
shortest word-length terms are summed up first, then the longest terms last. Also, the
proposed rounding scheme is applied to each sum tree: the approximate carry-ins (e.g.: R1,
R2, etc) are fed into all available CSAs, while the exact carry-ins are part of the CSD terms
from the beginning. All intermediate sums are computed with one extra bit, except the last
summation where the final output is rounded off to the desired length. Overall, the sum
tree ensures that a carry is accounted at every CSA and maintained a final output at a fixed
word-length of 11 bits.
Synthesized timing simulation results for this TIM-IF can be found in Appendix C.
56 Chapter 4. Time-interleaved ∆Σ-DAC Implementation
Based on H(z) and figure 3.6, the output of each TIM-DSM path is:
Px (z) = Ux (z) + (21 + 20 − 2−3 )Ex+1 (z) − (21 + 20 − 2−3 )Ex+2 (z) + Ex+3 (z) (4.2)
where Ex (z) is the truncation error from the xth path of the TIM-DSM, and Ux (z) is the
output of the xth path of the TIM-IF.
The sum tree for a TIM-DSM path is shown in figure 4.6. The same summation and
rounding scheme used in the TIM-IF are used for the TIM-DSM to maintain an 11-bit word-
length output. Recall from chapter 2 that a digital limiter was integrated with the DSM.
This ensures that the modulator will operate with a fixed word-length of 11 bits and saturate
to the largest digital value in case of an overflow. Through simulations, an overflow only
occurs when the input amplitude to the TIM-IF-DSM (TIM-IF + TIM-DSM) is close to the
full-scale value, namely > −0.5dBF S. In these cases, even though the TIM-DSM saturates,
the full system simulation still indicated good performance. Therefore, it is not necessary to
assign another overflow bit to the TIM-DSM. In fact, having an overflow bit for the TIM-
DSM would deteriorate performance since this 12th bit would usually be ’0’ and hence does
not contain any information. Thus, after bit truncation, only 3 out of 4 bits actually contain
meaningful data, resulting in a loss of output amplitude.
Synthesized timing results for this TIM-DSM can be found in Appendix C. Since the
digital front-end contains both the TIM-IF and TIM-DSM (i.e.: TIM-IF-DSM), unless men-
tioned otherwise, the behavioural simulations and physical design will contain both blocks.
4.1. Digital Baseband Front-End 57
(a) (b)
(c) (d)
Figure 4.8: TIM-IF-DSM output spectrum for VHDL behavioural simulations with 0dBFS
input amplitude at different frequencies: a) 0.13fB b) 0.25fB c) 0.50fB d) 0.93fB
62 Chapter 4. Time-interleaved ∆Σ-DAC Implementation
(a) SNR vs. Input amplitude (b) SNDR vs. Input amplitude
(c) SNR vs. Input frequency (d) SNDR vs. Input frequency
4.2.1. Multiplexer
Compared to a conventional ∆Σ-DAC, the only additional block in a time-interleaved ∆Σ-
DAC is the multiplexer, as shown in figure 4.1. The purpose of the multiplexer is to serialize
the parallel time-interleaved paths down to a single path. The multiplexing factor is iden-
tical to that of the time-interleaving factor, namely a factor of 8. Traditionally, an 8-to-1
multiplexer that achieves 4GS/s data rate would require three different clock rates: 500MHz,
1GHz, and 2GHz. This work proposes an 8-to-1 “ring” multiplexer which only requires a
single clock rate of 4GHz.
The 8-to-1 ring multiplexer consists of three parts, as depicted in figure 4.10: a ring shift
register, a switch shift register, and a data multiplexer. The ring shift register consists of 8
cascaded DFFs clocked at 4GHz (CLKa). It also consists of 2 transmission gates to set the
ring’s initial state to a known value (of logic 1) at power-up. This ring creates a pulse signal,
S0 , which has a period of 2ns and a pulse width same as CLKa period, namely 250ps. S0
has two purposes: it is used as the 500MHz clock pulse (CLKsw ) for the DFFs in the data
multiplexer, and also used to generate 8 switch signals (S1 − S8 ) in the switch shift register.
Instead of taking S1 − S8 directly from the ring shift register, a separate series of 8 DFFs
(i.e.: switch shift register) is needed because (S1 − S8 ) are not consecutively shifted (by 1
clock cycle) versions of S0 , but rather its delayed versions. These signals activate the switches
of the data paths in reverse as shown figure 4.10.
Figure 4.11 shows the timing diagram for an 8-to-1 ring multiplexer. Before S0 is used as
CLKsw , it has to go through a clock tree, which consists of 5 stages of branching fanout-of-2
buffers to drive 32 DFFs (since there are 8 parallel paths and 4 bits/path). Thus, CLKsw is
delayed by tclk tree from S0 .
64 Chapter 4. Time-interleaved ∆Σ-DAC Implementation
To meet the correct timing, the switch signals need to be aligned with the data paths
so they can output valid data. Since the same DFF is used everywhere, if the switch shift
register’s clock is aligned with the data multiplexer’s clock (CLKsw ), then the switch signals
are guaranteed to line up with the data paths.
The switch shift register’s DFFs must be clocked at the same rate as CLKa. However,
this clock is required to be delayed (CLKa dly) such that it will be edge-aligned with CLKsw .
According to figure 4.11, a simple solution is to delay CLKa approximately by one DFF plus
the clock tree’s propagation delay as: tclk dly = tdf f + tclk tree .
A more accurate but complicated solution is to use a PLL to align the phases of CLKa dly
and CLKsw . For this design, it is not necessary to use a PLL for exact alignment since a
skew of 20ps can be tolerated as long as all 8 switch pulses are within one data period. To
ensure proper timing alignment, the output multiplexed data will be re-timed by another
DFF at the switch driver before entering the analog back-end.
66 Chapter 4. Time-interleaved ∆Σ-DAC Implementation
Figure 4.12(a) and 4.12(b) show the Cadence simulation results (in TT corner, 27o C) for
the 8-to-1 ring multiplexer (MUX8) operating at 4GHz and 2GHz clock, respectively. Here,
the two clocks (CLKsw and CLKdly ) are aligned within 15ps and all eight switch pulses
(S1 −S8 ) are contained within one data period. Figure 4.12(b) shows that even for a frequency
lower than the one being designed for, the MUX8 still operates properly. Simulations over
different process corners and temperatures (i.e.: SS, 105o C and FF, −40o C) also showed the
MUX8’s functionality.
Figure 4.12: An 8-to-1 ring multiplexer transient response (TT corner) a) 4GHz b) 2GHz
4.2. High-Speed Digital Interface 67
The decimal range for a 4-bit binary word is [-8,7] and [0,15] in 2’s complement and
unsigned binary, respectively. Thus, the conversion from 2’s complement to unsigned binary
would only require an addition of 8. This is accomplished by inverting the most significant bit
68 Chapter 4. Time-interleaved ∆Σ-DAC Implementation
while the remaining bits (V < 2 : 0 > and A < 2 : 0 >) are identical in both representations.
The B2T converter’s area is reduced through gate re-use and its codes propagation delays
are matched as much as possible.
The thermometer codes are used to switch the DAC’s current-steering cells. Before
entering the analog back-end, these codes need to be re-timed and driven by the switch
drivers. The details on switch drivers are discussed in appendix C.
Figure 4.14: High-speed digital interface Cadence transient response (TT corner)
70 Chapter 4. Time-interleaved ∆Σ-DAC Implementation
Practical Considerations
A major challenge in current calibration is matching the output current Iout between the cells.
The main mismatches occur at the calibration switches and the MOS transconductance. The
switch mismatches are due to their sizes, which are required to be small to keep Ileak minimal.
Thus, a mismatch in the charge-injection for each cell is expected [40]. To reduce this effect,
two additional transmission-gate (TG) switches (T2 and T3 ) are added to the main switch
(T1 ) to cancel the charge-injection occurring at the gate of M1 , as depicted in figure 4.15.
To minimize the effect of mismatches between copies of M1 , the transconductance gm can
be made small, thus reducing the drain current’s sensitivity to Vgs variations. To achieve
this task, a secondary current source, I2 , is added in parallel with M1 to sink about 90% of
Iref [37]. Since M1 only sinks the remaining 10% of Iref , its gm can be relatively small.
To achieve a small gm , the W/L aspect ratio should be made as small as possible. Also,
having a large W and an especially large L transistor increase Cgs and improves the matching
of the current cells. Therefore, charge-injection and leakage current effects are reduced in
accordance with equation C.2. However, there is a limitation on how small the W/L aspect
4.3. High Speed Analog Back-End 71
ratio can be depending on the supply headroom of the CMOS process. For STMicroelec-
tronics 90nm CMOS process with a 1V supply and 250mV threshold voltage, the maximum
value for Vgs is approximately 450mV. Consequently, the W/L ratio is around 10/1.
To make the calibration continuous, the cell that is being calibrated needs to be invisible or
taken off-line from the DAC’s output. In place of this cell, a “dummy” identical cell needs
to fill in the gap. Thus, instead of having 2N − 1 cells for an N-bit DAC, the calibration
network requires 2N cells with the extra one being a dummy. For a 4-bit DAC, there are 16
calibration cells while there are only 15 current-steering cells as shown in figure 4.16.
The dummy current cell has identical design as a regular current cell, except that its
output is dynamically connected to different cell at different time. Initially, the dummy cell
is calibrated first so it is available to fill in for whichever regular cell is in calibration. Each
regular cell is selected one at a time by a 16-stage ring counter operating at 1/Tc . While
this cell is being calibrated for Tc /16 seconds, the calibration switch immediately disconnects
its output from the DAC and switches over to the dummy cell. The dummy cell’s Iout now
becomes the current source for the DAC’s current-steering cell. Upon completion, the switch
72 Chapter 4. Time-interleaved ∆Σ-DAC Implementation
returns to the original state and the next cell is calibrated. The dummy cell’s Iout is now
available for the next cell.
Figure 4.16 shows an example when cell 1 is under calibration for a 4-bit DAC. This
technique ensures that there are always 15 equal currents available at the output terminal,
hence allowing the DAC to operate uninterrupted.
The calibration circuitry, which only consists of a charge-storage MOS transistor and
switches, requires no external components. Thus, it can be integrated together with the
DAC current-steering cells. The calibration simulation results will be shown together with
the current-steering DAC in the next section.
4.3. High Speed Analog Back-End 73
Figure 4.17(a) shows the bias current mirror circuitry, which replicates an off-chip current
source to generate Iref and a current array, Ic < 15 : 0 >, to bias the secondary calibration
source I2 . Here, simple current mirrors are used instead of cascode current mirrors due to
the headroom limitation of a 1V analog supply voltage (VDDa). Figure 4.17(b) shows the
dummy calibration cell schematic which supplies Idummy to whichever current-steering cell
being calibrated.
Figure 4.18 shows the current-steering cell with self-calibration circuitry. Here, the TG
switches, (T1 −T3 ) and the MOS transistors (M1 −M3 ) belong to the calibration cell ; whereas
the other three TG switches (T4 −T6 ) belong to the calibration switch network. The remaining
74 Chapter 4. Time-interleaved ∆Σ-DAC Implementation
MOS transistors, M4 − M7 , belong to the current-steering cell, in which, M6 and M7 are the
current-steering (CS) switches.
Once the current-steering schematic is chosen, the next task is to determine the appropriate
output swing, such that it not only ensures the current mirror’s functionality but also meets
the SNR requirement. For an analog supply voltage of 1V, there is a little available headroom
to start with.
To maintain the current mirror’s functionality, namely keeping M5 in saturation, the
drain-source voltage of M5 requires at least 300mV (i.e.: Vds5 ≈ 300mV). Since M6 and M7
operate as full-swing switches, there is about 100mV drop across each of them. This leaves
at most 600mV per side for the output swing as shown in figure 4.19(a).
To meet the required SNR of 56dB, the output swing has to be large enough to sufficiently
overcome the output noise. Assuming that the main noise source is dominated by thermal
noise and neglecting flicker (1/f) noise at low frequency, the output noise can be modelled
as illustrated in figure 4.19(b).
Figure 4.19: a) Output swing b) Output noise model c) Simplified output noise model
Here, the current-steering switch is represented by its ON resistance, Rsw , and the off-chip
load is represented by a passive load resistance, RL . The thermal noise of a long-channel MOS
2
operating in saturation can be represented as a current source, In,M 5 , connected between its
2
drain and source terminals. The thermal noise of a resistor is a current source, In,R , connected
76 Chapter 4. Time-interleaved ∆Σ-DAC Implementation
in parallel [46]. The simple representations for these noise sources are:
2 2
In,M 5 = 4kT γgm and In,R = 4kT /R (4.3)
where γ has a value of 2/3 for long-channel transistors but higher for deep sub-micron
transistors. Its exact value varies depending on the CMOS process and is still under research.
For example, in [47], γ has a value around 1.6 and 1.8 for PMOS and NMOS, respectively.
Using superposition and assuming that as long as, rds5 >> (Rsw + RL ):
2
2 rds5 RL 2 2 2 2
Vn,out(r =
ds5 )
In,M 5 ≈ In,M 5 RL (Vrms /Hz) (4.4)
rds5 + Rsw + RL
2
2 R sw R L 2 2
Vn,out(R sw )
= In,R ≈0 (Vrms /Hz) (4.5)
rds5 + Rsw + RL sw
2
Vn,out(R L)
= (RL ||(rds5 + Rsw )))2 In,R
2
L
2
≈ In,R L
RL2 2
(Vrms /Hz) (4.6)
2
Thus, Vn,out can be simplified as depicted in figure 4.19(c) and:
2 2 2 2 2
Vn,out ≈ (In,M 5 + In,RL )RL = 4kT RL (γgm RL + 1) (Vrms /Hz) (4.7)
Equation 4.7 suggests a small value for RL to minimize the output thermal noise, but this
would also decrease the output swing. In this work, the current-steering cells are designed
for RL =50Ω to ease the impedance-matching with the 50Ω test equipment.
Based on the value RL =50Ω and simulations under nominal conditions, an output swing
around 500mV per side or 1V differential would sufficiently yield an SNR of 56dB.
Since the CS-DAC outputs differential currents, the loads at the open-drained outputs can
be either passive or active, as depicted in figure 4.20. Both contain a resistor which converts
differential currents into a differential voltage:
For a passive load in figure 4.20(a), the resistors are connected between the CS-DAC
output and ground. The advantage of a passive load is that there is high bandwidth due
to its simple open-loop configuration. However, the downside is that the output swing is
4.3. High Speed Analog Back-End 77
limited by the available voltage headroom (i.e.: 600mV max) to keep the current source
(M5 ) in saturation. In addition, the CS-DAC’s output resistance per side, Rout , varies
slightly depending on the number of active current cells in use. Rout can be approximated
as:
ro5 + Rsw
Rout ≈ RL k (4.9)
Ncs
where ro5 is the output resistance of M5 , Rsw is the on resistance of switch M6 , and Ncs is
the number of active current cells in use. The variations in Rout directly correspond to the
variations in the output LSB step size, which is a highly undesirable effect that degrades the
CS-DAC’s linearity performance.
For an active load in figure 4.20(b), the resistor is connected in feedback through a
differential opamp. Since the opamp’s input impedance is much higher than RL , all current
will flow into RL . The active load offers higher swing than that of the passive load since the
output currents are now connected to the opamp’s inputs which act like AC virtual ground.
Also, since Rout looking into Vout is constant, the active load does not suffer Vout variations
as in the case of a passive load. The disadvantages of an active load are limited bandwidth
due to the close-loop (feedback) configuration and non-idealities from the opamp’s design
(e.g.: finite gain, offset, bandwidth, etc). An opamp gain of 60dB is sufficient for this design.
78 Chapter 4. Time-interleaved ∆Σ-DAC Implementation
Figure 4.21 compares the output stair case for active versus passive load. It shows
the increase in Vout variations for passive load, as more current is switched to either side.
Therefore, an active load is more favourable in this design.
This design is fabricated using STMicroelectronics 90nm CMOS with 7 metal layers and 1V
supply for the entire chip. While a low power supply lowers the power consumption of a
large digital circuitry, it makes the analog design a challenging task. For instance, this supply
does not allow the stacking of multiple transistors due to the limited available headroom to
maintain the transistor’s operating region.
Another major issue of deep sub-micron technology is the high leakage currents which
include subthreshold leakage, gate oxide tunneling leakage, junction leakage, hot-carrier in-
jection leakage, gate-induced drain leakage, and punch-through leakage currents [48]. As a
result, although the design dissipates minimum dynamic power during switching, its static
leakage power begins to catch up. For instance, the simulated total power consumption for
the digital front end is around 51mW, of which 23mW (45%) is leakage power. There is
a great deal of ongoing research to replace the SiO2 gate dielectric with a high-k dielec-
4.3. High Speed Analog Back-End 79
tric material to combat increasing leakage currents and to sustain the scaling of CMOS
technology.
Lastly, since STMicroelectronics 90nm CMOS was a rather new technology, its model
parameters were still not accurate or well defined. For example, the gate resistance was not
modelled, hence a gate resistor was added to the transistor with a value according to [49]:
Rgsq Wf Rcont
Rg = + (4.10)
3 Nf lG Ncont Nf
where Nf and Wf are the number of fingers and finger width in µm, Rgsq and Rcont are
gate resistance/square and gate contact resistance, and Ncont and lG are the number of gate
contacts and gate length in µm, respectively. Table 4.3 summarizes the CS-DAC design
including transistor sizes, layout, and drain current.
Figure 4.23(a) depicts an example of a TIM ∆Σ-DAC output spectrum with and with-
out current calibration. With calibration on, the inband harmonics are reduced, resulting
in higher linearity. Figure 4.23(b) depicts the SNR performance versus input amplitude
with and without current calibration. It shows that with calibration on, there is an average
SNR improvement of 1dB and 5dB for input amplitude below and above -10dBFS, respec-
tively. Unless mentioned otherwise, all subsequent simulation results will have the current
calibration on.
Figures 4.24(a) and 4.24(b) depict the TIM ∆Σ-DAC’s accuracy performance versus input
amplitude without and with transistor mismatch, respectively. Similarly, figures 4.25(a) and
4.25(b) depict the TIM ∆Σ-DAC’s accuracy performance versus input frequency without
and with transistor mismatch, respectively. Without transistor mismatch, the peak SNRs
4.3. High Speed Analog Back-End 81
Figure 4.23: TIM ∆Σ-DAC performance with and without current calibration (for active
load, TT corner with transistor mismatch)
are 57dB (9.2 bits) and 62dB (10 bits) for passive and active load, respectively. However,
with transistor mismatch, the peak SNRs degrade to 50dB (8 bits) and 54dB (8.7 bits), and
the dynamic ranges are 52dB and 56dB for passive and active load, respectively.
Lastly, table 4.4 shows the simulated power consumption of the TIM ∆Σ-DAC at 1V sup-
ply. The digital front-end consumes the most power since it contains the most computations
and largest hardware partition.
Figure 4.24: TIM ∆Σ-DAC’s SNR/SNDR vs. Input amplitude for a single-tone input at
0.25fB (TT corner)
Figure 4.25: TIM ∆Σ-DAC’s SNR/SNDR vs. Input frequency for a single-tone amplitude
of 0dBFS (TT corner)
4.4. TIM ∆Σ-DAC Integration 83
(a) (b)
The chip is designed to have several power supplies for better power management and
84 Chapter 4. Time-interleaved ∆Σ-DAC Implementation
testing flexibility. Specifically, the supply for the baseband digital, high-speed interface, and
high-speed analog are VDDd, VDDhs, and VDDa, respectively. The I/O drivers also has
its own power supply, VDDio. Thus, if neccessary, each section of the chip can operate on
a different power supply to improve performance. In this work, the entire chip operates on
a 1V supply. Also, having multiple power supplies give the flexibility to test each section of
the chip separately, while powering down the rest of the chip.
Figures 4.27 and 4.28 show the floor planning and final layout of the TIM ∆Σ-DAC. The
chip occupies about 1.52mm×1.52mm of silicon area in 90nm CMOS technology. The layout
was pad-limited. The pad frame contains all analog/RF pads without ESD protection to
achieve high speed operation. The core area fits within 1.06mm2 , of which 0.34mm2 contains
the digital standard cells for TIM8-IF and TIM8-DSM.
Both the high-speed digital interface and the analog back-end require a custom layout op-
timized for high-speed and low mismatch operations. To accommodate a high-speed layout,
multi-finger transistors with double-gate connections are used to minimize gate resistance. In
addition, high metal layers (M4-M7) are used for high-speed signal routing to minimize sub-
strate parasitic capacitance. To reduce mismatches between current cells, they are routed
as close as possible and contained at least 2 dummy gates on each side. Furthermore, a
common centroid or finger inter-digitation layout is used to improve matching.
Lastly, each subcircuit (e.g.: current mirror) is surrounded with a ring of substrate
contacts to reduce substrate resistance and crosstalk. Multiple N/P-well rings surround each
digital or analog section, as shown in figure 4.28, to provide as much isolation as possible.
4.4. TIM ∆Σ-DAC Integration 85
This chapter presents the experimental results of the fabricated TIM ∆Σ-DAC in STMi-
croelectronics 90nm CMOS. Specifically, the accuracy and linearity performance of TIM
∆Σ-DAC are measured. The experimental setup and testing issues are also discussed here.
In order to test the functionalities of the TIM ∆Σ-DAC, the chip must be packaged then
integrated onto a PCB. Since this chip operates at a relatively high speed, it is important
to select a package and bonding material which are capable of high-speed operation. In
this work, the package is a 32-pin ceramic FlatPack (FP32) that uses gold bond wires and
supports an operating frequency up to 7GHz. Figures 5.2(a) and 5.2(b) show the packaged
chip and its integration with the PCB, respectively.
The PCB supports several testing configurations. Firstly, it permits testing with either a
passive or active load. Switches steer the output currents to either a grounded 50Ω resistor
87
88 Chapter 5. Time-interleaved ∆Σ-DAC Performance
Figure 5.1: Die photos of the TIM ∆Σ-DAC chip fabricated in 90nm CMOS
(a) Chip package and bonding (b) TIM ∆Σ-DAC test PCB
or a resistor in feedback around a differential opamp. Due to the low voltage supply (1V) and
broad bandwidth (250MHz) of the TIM ∆Σ-DAC, it is hard to find a commercial differential
opamp which will not limit the performance of the device under test (DUT). For example,
a differential opamp from Texas Instruments (THS4508) has sufficient gain and bandwidth;
however, its minimum common-mode output voltage is still higher than 1V. Since the DUT’s
outputs are open-drain PMOS devices, having a drain voltage higher than its supply can
damage the entire chip via the forward-biased diode in the N-well. Thus, a passive load of
50Ω is used for all measurements.
Secondly, to allow higher testing flexibility, the PCB is designed to support testing with
either an Agilent 93000 SOC tester or an Agilent 81250 parallel bit-error-rate (ParBert)
tester as the input source. The full test setup is depicted in figure 5.4. Both the 93K SOC
and ParBert testers have the ability to test the entire chip, the digital front-end or the analog
back-end alone. For full-chip testing, the tester will send a 10-bit digital pattern (generated
from Matlab) to the TIM ∆Σ-DAC. A 2-way 180o power combiner is used to convert the
differential outputs into a singled-ended analog output. Lastly, a spectrum analyzer is used
to analyze the analog spectrum and to capture data for the Matlab post-processing. Due to
some design issues in the digital front-end, which will be discussed in the next section, the
full chip was not tested. Thus, only test results from analog back-end and its interface (i.e.:
B2T converter & switch driver) are reported.
To test the analog back-end alone, a VHDL simulation is used to generate the 4-bit dig-
ital output of the TIM-IF-DSM. This data is then multiplexed in Matlab and transferred to
the ParBert, which in turn sends a 4-bit data pattern to the chip’s I/Os. Figures 5.3 and
5.5 show the analog test flow and its experimental setup, respectively.
Figure 5.4: Full test setup for Agilent 93K SOC or Agilent ParBert platform
1. The “roundoff error reduction scheme” (section 4.1.2): The fabricated TIM-IF utilized
the “truncation with no rounding” technique which caused large roundoff errors and
limited the SNR of entire TIM ∆Σ-DAC.
2. The “unnecessary overflow bit” (section 4.1.5): The fabricated TIM-DSM was over-
designed to have a final sum of 12 bits because a digital limiter was not yet introduced.
Since the 12th bit was mostly ’0’, this resulted in a loss of output amplitude.
Thus, the digital front-end measurements were omitted since its outputs will not contain
meaningful data. However, the simulations in chapter 4 and the analog back-end’s exper-
imental results in this chapter employ a digital design with all of these errors corrected.
Here, the corrected VHDL digital front-end results are imported into Cadence for a full-chip
mixed-signal simulation; this method detects any system integration or design errors.
92 Chapter 5. Time-interleaved ∆Σ-DAC Performance
(a) Measured stair case transient (b) Simulated stair case transient (SS, 105o C)
Secondly, the current calibration circuitry was verified. For a passive load, aside from
an increase of around 100mV in Vout , there was no improvement in accuracy or linearity
regardless of calibration being on or off. Instead, having the calibration on introduced some
calibration feed-through which mixed with the fundamental signal, and generated inband
tones. For example, figures 5.7(a) and 5.7(b) show the calibration feed-through tones for an
input at 0.29fB and CLKcalib at 0.4fB an 0.2fB , respectively. In figure 5.7(a), the second-
order intermodulation product (IM2), fsignal + fcalib , shows up at 0.7fB (marker 4). In figure
5.7(b), the IM2 shows up at 0.49fB (marker 3), as well as its harmonics at marker 2 and 4.
Thus, the calibration circuitry is switched off for all subsequent measurements.
(a) Calibration feed-through for 0.29fB input and (b) Calibration feed-through for 0.29fB input and
CLKcalib = 0.4fB CLKcalib = 0.2fB
Lastly, the clock divide-by-8 circuitry was verified even though it was intented to generate
a clock for the digital front-end. Figures 5.8(a) and 5.8(b) show two examples of the clock
divider operating with 2.66GHz and 2.0GHz clock inputs, respectively. The divided clocks
are 332.9MHz and 249.9MHz, implying the clock divider operates correctly.
94 Chapter 5. Time-interleaved ∆Σ-DAC Performance
(a) Divided-by-8 clock for 2.66GHz input (b) Divided-by-8 clock for 2GHz input
(a) Transient response for 0.13fB input (b) Transient response for 0.29fB input
Figure 5.9: CS-DAC transient response for a single-tone, 0dBFS input amplitude (top -
single ended outputs; bottom - differential output)
5.3. High Speed Analog Measurements 95
(a) Noise shaped spectrum for 0.13fB input (b) Inband spectrum for 0.13fB input
(c) Noise shaped spectrum for 0.29fB input (d) Inband spectrum for 0.29fB input
Figure 5.10: Noise shape and inband spectra for a single-tone, 0dBFS input amplitude at
0.13fB and 0.29fB
96 Chapter 5. Time-interleaved ∆Σ-DAC Performance
Figure 5.11 shows the measured and simulated CS-DAC accuracy performance. The
measured SNR and SNDR are almost identical since the dominant noise source is the noise
floor rather than the harmonic distortions. For the measurements versus amplitude, the
input is a single tone at 0.25fB ; for the measurements versus frequency, the input amplitude
is 0dBFS. Figure 5.11(a) shows a peak measured SNR/SNDR of 46dB, which corresponds
to an accuracy of 7.3 bits. The dynamic range is also around 46dB. Figures 5.11(b) shows
a measured accuracy of at least 44dB (7 bits) up to 0.8fB , and 38dB (6 bits) for the entire
bandwidth. Compared to the simulated results (passive load with transmistor mismatch),
there is an average discrepancy of 5dB due to unaccounted parasitics and PVT variations.
(a) SNR and SNDR vs. Input amplitude (b) SNR and SNDR vs. Input frequency
Figure 5.11: CS-DAC accuracy performance with single-tone input and passive load
5.3. High Speed Analog Measurements 97
(a) Two-tone spectrum near 0.25fB (b) Two-tone spectrum near 0.93fB
Another linearity measurement is the “missing-tone” test, in which the inband spec-
trum contains multiple equally-spaced tones except leaving the middle one empty. The
intermodulation products of these tones will be concentrated at the empty bin, causing the
“missing-tone” to appear. The amplitude difference between the input signal tones and the
“missing-tone” is called the Multi-tone Power Ratio (MTPR), which reflects the system lin-
earity. This test is particularly relevant for systems employing OFDM since the transmitted
spectrum consists of many sub-channels at equally-spaced frequencies.
This experiment uses 128 tones (sub-channels) based on an UWB standard from [6], in
which the 64th tone is left empty. For a bandwidth of 166MHz, this corresponds to a sub-
channel spacing of 1.3MHz. Figures 5.13(a) and 5.13(b) show the multi-tone noise shaped
spectrum and the MTPR measurement, respectively. The measured MTPR is 38dB.
(a) Multi-tone noise shaped spectrum (b) Multi-tone power ratio measurement
Power Distribution
Total (mW) @ 1V Supply 120 107
The measured power consumption is 107mW; in which, 32mW is due to the analog pro-
totype sampled at 2.66GS/s and 75mW is due to the digital front-end sampled at 250MS/s.
The digital front-end was tested at 250MS/s instead of 333MS/s due to the speed limita-
tion of the Agilent 93K SOC tester. The measured power consumption is 102mW when the
digital front-end was sampled at 250MS/s while the rest of the chip was sampled at 2GS/s.
Overall, the TIM ∆Σ-DAC power distribution shows that the digital front-end consumes the
most power since it contains a large amount of digital circuitry and computation volume.
100 Chapter 5. Time-interleaved ∆Σ-DAC Performance
Peak SNR 62 54 57 50 46 dB
Peak SNDR 60 52 55 48 46 dB
Dynamic Range 63 56 58 52 46 dB
Peak SFDR - - - - 56 dB
MTPR (128 tones) - - - - 38 dB
Bandwidth (fB ) 250 166 MHz
Sampling Rate (fS ) 4 2.66 GS/s
Oversampling Ratio (OSR) 8 8 -
Supply Voltage 1 1 V
Power 120 107 mW
Area 1.52mm × 1.52mm
Process Technology STMicroelectronics 90nm CMOS, 7M2T
Chapter 6
Conclusions
In conclusions, this thesis presents the analysis and design of a time-interleaved delta-sigma
digital-to-analog converter (TIM ∆Σ-DAC). The digital front-end of the TIM ∆Σ-DAC
comprises a 95th -order time-interleaved-by-8 FIR interpolation filter (TIM-IF) and a 3rd -
order, 4-bit, time-interleaved-by-8 ∆Σ modulator (TIM-DSM). The analog back-end of the
TIM ∆Σ-DAC comprises a 4-bit current-steering DAC with continuous current calibration.
The high-speed digital interface between these two domains comprises of an 8-to-1 ring
multiplexer, a binary-to-thermometer converter, and 15 switch drivers.
The time-interleaved architecture uses parallelism based on block digital filtering to sup-
port a low OSR of 8; this results in a large effective bandwidth for broadband applications.
The TIM-DSM utilizes an error-feedback architecture with optimized NTF zero to improve
SNR performance. The digital front-end (TIM-IF-DSM) implementation uses CSD repre-
sentation with rounding scheme for minimum round-off errors, and parallel CSA adders with
optimized staging for minimum propagation delays.
The eight parallel outputs of the TIM-IF-DSM is serialized into a single 4-bit stream
through an 8-to-1 ring multiplexer. These bits are converted into thermometer codes then
into analog signal using 15 current-steering cells. An additional dummy current-steering
cell is used to allow continuous current calibration. The differential analog outputs are
open-drain which gives the flexibility of having either a passive or an active output load.
The TIM ∆Σ-DAC was designed to operate at 4GS/s with a bandwidth of 250MHz.
101
102 Chapter 6. Conclusions
The simulation results show a peak SNR of 62dB and 57dB for active and passive load with
no transistor mismatch, respectively; the peak SNRs are 54dB and 50dB, with transistor
mismatch.
The chip was fabricated in STMicroelectronics 90nm CMOS. The analog back-end was
tested with modulated data from VHDL simulation of the digital front-end. It was measured
at 2.66GS/s and achieved a bandwidth of 166MHz, an SNR of 46dB and an SFDR of 56dB.
At 2GS/s, the prototype consumed 102mW from a 1V supply.
Table 6.1 briefly compares the performance this work with the prior state-of-the-art
which utilizes parallelism in ∆Σ modulation (either time-division multiplexing, TDM, or
time-interleaving, TIM).
The function of a digital ∆Σ modulator (DSM) is to reduce the word-length of the input
signal to a few bits without affecting its in-band spectrum. Since the reduction in word-length
introduces a large truncation error, the modulator must push this added noise outside the
band of interest, hence the term “noise shaping”.
The conventional first-order single-bit DSM is shown in figure A.1. It contains three
main components: the digital loop filter H(z) (i.e.: Σ), the bit truncator T, and the feedback
delay & subtractor (i.e.: ∆). Although this system is highly non-linear, a simple linear model
in the z-domain can be used to analyze its operation. Since the main noise component is
generated by the truncator T, its linear model is represented by an additive noise source,
E(z).
From figure A.1, the input and output of a first-order DSM can be related as follows:
105
106 Appendix A. Conventional ∆Σ Modulator
where the signal transfer function, ST F (z) = 1 and the noise transfer function, N T F (z) =
(1−z −1 ). Here, the signal is the exact replica of the input while the truncation noise is shaped
by a high-pass response (which suppresses the noise near DC and amplifies the out-of-band
noise). For a nth -order lowpass DSM, the system transfer function is:
in which N T F (z) = (1 − z −1 )n .
If the input signal is a full-scale sine wave with peak amplitude A and the truncation
error is assumed to be uniformly distributed, the signal to noise ratio (SNR) for 1st -order
DSM can be approximated as [1]:
9A2 (OSR)3
SN R = (A.4)
2π 2
In equation A.4, the OSR is the oversampling ratio which defines how fast the system is
oversampled with respect to the Nyquist-rate. It is the ratio between the system sampling
frequency, fS , and twice the signal bandwidth, fB (i.e.: the Nyquist sampling frequency).
fS
OSR = (A.5)
2fB
The resolution of a data converter is often specified by its effective number of bits (ENOB)
which is related to the output SNR (in dB) with a sine-wave input by the following equation:
In a ∆Σ-DAC, the SNR can be controlled by three main parameters: the OSR, the order
of H(z), and the number of truncator bits. Increasing any of these parameters will increase
the SNR which directly translates to an improvement in ENOB. However, there are always
trade-offs between resolution, speed, power consumption, and design complexity.
Appendix B: TIM ∆Σ-DAC Matlab Results
Figure B.3 shows the TIM-IF-DSM response for the system with an ideal “brick-wall”
filter and for the system with an analog LPF. Compared to the ideal filter, the analog LPF
results in about 2.3dB and 1.2dB degradation in SNR and SNDR, respectively, as depicted
107
108 Appendix B. TIM ∆Σ-DAC Matlab Results
(a) (b)
(c) (d)
Figure B.2: TIM ∆Σ-DAC output spectrum with analog LPF for Matlab simulations with
0dBFS input amplitude at different input frequencies a) 0.13fB b) 0.25fB c) 0.50fB d) 0.93fB
110 Appendix B. TIM ∆Σ-DAC Matlab Results
in figure B.3(a). This degradation is quite acceptable since the full TIM ∆Σ-DAC, including
the analog filter, still yields about 9 bits accuracy up to 0.93fB , as depicted in figure B.3(b).
(a) SNR and SNDR vs. Input amplitude (b) SNR and SNDR vs. Input frequency
Figure B.3: TIM ∆Σ-DAC response with an ideal vs. analog filter
B.2. TIM-IF-DSM Output Spectrum with DAC Mismatches 111
Figure B.4: TIM-IF-DSM output spectrum with thermometer DAC element mismatches
112 Appendix B. TIM ∆Σ-DAC Matlab Results
Appendix C: TIM ∆Σ-DAC Implementation
113
114 Appendix C. TIM ∆Σ-DAC Implementation
Figure C.5 depicts the schematic of a switch driver and a DFF. The DFF’s purpose is
to sample/re-time the thermometer codes at 4Gs/s to ensure their proper timing alignment.
The latch between output data path (Do) and its complement(Do) is used to align their edge
intersections to half-swings. Lastly, the additional transmission gate on Do path is used for
propagation delay matching.
118 Appendix C. TIM ∆Σ-DAC Implementation
The switches S1 and S2 are in the states depicted in of figures C.6(a) and C.6(b) for
the calibration and operation phases, respectively. During calibration, S1 puts the MOS
transistor M1 into saturation due to its diode connection while S2 allows Iref to flow into
M1 . This forces the gate-source voltage (Vgs ) and the charge on the parasitic capacitance
Cgs of M1 to whatever value required so that its drain current, Ids , equals Iref . During the
operation phase, although S1 is opened, Vgs is theoretically unchanged since the charge on
Cgs is preserved. This allows S2 to source approximately the same current, Iref , from the
output.
In a practical implementation, S1 and S2 are made of MOS transistors. Whenever S1
switches off, its channel charge is partly dumped on to the gate of M1 (called “charge-
injection”), causing the charge on Cgs to decrease by the same amount. This results in a
sudden decrease of Vgs . In addition, another effect causes Vgs to decrease. Although S1 is
off, the reverse-biased diode between its source and substrate is still present, causing Vgs to
decrease gradually due to leakage current [40].
The reduction in Vgs , due to charge-injection (∆q) and leakage current (Ileak ), causes Ids
to decrease as a function of time according to the following calculations [40]:
∆q Ileak
Ids (t) = Iref − gm − gm t (C.1)
Cgs Cgs
q
where Cgs = 32 W LCox and gm = 2µCox W I .
L ds
Equation C.2 indicates that after a certain time Tc , the cell needs to be re-calibrated to
maintain its output current with a specified accuracy.
120 Appendix C. TIM ∆Σ-DAC Implementation
References
[1] Richard Schreier and Gabor C. Temes, Understanding Delta-Sigma Data Converters.
Hoboken, New Jersey, USA: John Wiley & Sons, Inc, 2005.
[3] Danijela Cabric, Mike S.W. Chen, David A. Sobel, Jing Yang and Robert W. Broder-
sen, “Future wireless systems: UWB, 60GHz, and Cognitive radios,” in IEEE Custom
Integrated Circuits Conference, CICC, pp. 793–796, September 2005.
[5] D. Dardari and V. Tralli, “High-speed indoor wireless communications at 60 GHz with
coded OFDM,” IEEE Transactions on Communications, vol. 47, no. 11, pp. 1709–1721,
November 1999.
[6] B. Razavi, T. Aytur, C. Lam, F. Yang, K. Li, R. Yan, and H. Kang, “A UWB CMOS
transceiver,” IEEE Journal of Solid-State Circuits, vol. 40, no. 12, pp. 2555–2562, De-
cember 2005.
[7] J. Balakrishnan, A. Batra, and A. Dabak, “A multi-band OFDM system for UWB
communication,” in IEEE Conference on Ultra Wideband Systems and Technologies,
pp. 354–358, 2003.
121
122 References
[8] P. Smulders, “60 GHz radio: prospects and future directions,” in Proceedings of IEEE
10th Symposium on Communications and Vehicular Technology, pp. 1–8, November
2003.
[9] T. C. Chen, “Where CMOS is going: trendy hype vs. real technology,” in IEEE Inter-
national Solid-State Circuits Conference, ISSCC, pp. 1–18, February 2006.
[10] Fu-Liang Yang, Jiunn-Ren Hwang, and Yiming Li, “Electrical characteristic fluctuations
in sub-45nm CMOS devices,” in IEEE Custom Intergrated Circuits Conference, CICC,
vol. 1, pp. 691–694, 2006.
[11] Jing Cao, Haiqing Lin, Yihai Xiang, Chungpao Kao, and Ken Dyer, “A 10-bit 1GSam-
ple/s DAC in 90nm CMOS for embedded applications,” in IEEE Custom Intergrated
Circuits Conference, CICC, vol. 1, pp. 165–168, 2006.
[12] K. Doris, J. Briaire, D. Leenaerts, M. Vertregt, and A. van Roermund, “A 12b 500MS/s
DAC with >70dB SFDR up to 120MHz in 0.18µm CMOS,” in IEEE International Solid-
State Circuits Conference, ISSCC, pp. 116–117, 588, February 2005.
[13] Anne Van den Bosch, Marc A. F. Borremans, Michel S. J. Steyaert, and Willy Sansen,
“A 10-bit 1-GSample/s Nyquist current-steering CMOS D/A converter,” IEEE Journal
of Solid-State Circuits, vol. 36, no. 3, pp. 315–324, March 2001.
[14] Chi-Hung Lin and Klass Bult, “A 10-b, 500-Msample/s CMOS DAC in 0.6 mm2 ,” IEEE
Journal of Solid-State Circuits, vol. 33, no. 12, pp. 1948–1958, December 1998.
[15] David B. Barkin, Andrew C.Y. Lin, David K. Su, and Bruce A. Wooley, “A CMOS
oversampling bandpass cascaded D/A Converter with digital FIR and current-mode
semi-digital filtering,” IEEE Journal of Solid-State Circuits, vol. 39, no. 4, pp. 585–593,
April 2004.
[16] Todd S. Kaplan, Joseph F. Jensen, Charles H. Fields, and M. Frank Chang, “A 2-Gs/s
3-bit ∆Σ-Modulated DAC with tunable bandpass mismatch shaping,” IEEE Journal of
Solid-State Circuits, vol. 40, no. 3, pp. 603–610, March 2005.
References 123
[17] Susan Luschas, Richard Schreier, and Hae-Seung Lee, “Radio frequency digital-to-
analog converter,” IEEE Journal of Solid-State Circuits, vol. 39, no. 9, pp. 1462–1467,
September 2004.
[18] Fred Harris, and Pranesh Sinha, “On synthesizing high speed sigma-delta DACs by
combining the outputs of multiple low speed sigma-delta DACs,” in IEEE Conference
on Signals, Systems and Computers, vol. 2, pp. 1050–1054, November 2002.
[21] Mucahit Kozak, and Izzet Kale, “Novel topologies for time-interleaved Delta-Sigma
modulators,” IEEE Transactions on Circuits and Systems-II: Analog and Digital Signal
Processing, vol. 47, no. 7, pp. 639–654, July 2000.
[22] Ian Galton, and Henrik T. Jensen, “Oversampling parallel Delta-Sigma modulator A/D
conversion,” IEEE Transactions on Circuits and Systems - II: Analog and Digital Signal
Processing, vol. 43, no. 12, pp. 801–810, December 1996.
[23] Ramin Khoini-Poorfard, Lysander B. Lim, and David A. Johns, “Time-interleaved over-
sampling A/D converters: Theory and Practice,” IEEE Transactions on Circuits and
Systems-II: Analog and Digital Signal Processing, vol. 44, no. 8, pp. 634–645, August
1997.
[24] Katayoun Falakshahi, Chih-Kong Ken Yang, and Bruce A. Wooley, “A 14-bit, 10-
Msamples/s D/A converter using multibit ∆Σ modulation,” IEEE Journal of Solid-
State Circuits, vol. 34, no. 5, pp. 607–615, May 1999.
[25] Martin Clara, Wolfgang Klatzer, Andreas Wiesbauer, and Dietmar Straeussnigg, “A
350MHz low-OSR ∆Σ current-steering DAC with active termination in 0.13 µm
124 References
[27] Yunyoung Choi, and Franco Maloberti, “Design of oversampling current steering DAC
with 640Mhz equivalent clock frequency,” in IEEE International Symposium on Circuits
and Systems, ISCAS, vol. 1, pp. 109–112, May 2002.
[28] Tao Shui, R. Schreier, and F. Hudson, “Mismatch shaping for a current-mode multibit
Delta-Sigma DAC,” IEEE Journal of Solid-State Circuits, vol. 34, no. 3, pp. 331–338,
March 1999.
[29] I. Fujimori, A. Nogi, and T. Sugimoto, “A multibit Delta-Sigma audio DAC with 120-
dB dynamic range,” IEEE Journal of Solid-State Circuits, vol. 35, no. 8, pp. 1066–1073,
August 2000.
[33] Peter Kiss, Jesus Arias, Dandan Li, and Vito Boccuzzi, “Stable high-order Delta-Sigma
digital-to-analog converters,” IEEE Transactions on Circuits and Systems-I: Regular
Papers, vol. 51, no. 1, pp. 200–205, January 2004.
References 125
[34] P. P. Vaidyanathan, Multirate Systems and Filter Banks. Eaglewood Cliffs, New Jersey,
USA: P T R Prentice Hall, Inc , 1993.
[35] Mucahit Kozak, Mustafa Karaman, and Izzet Kale, “Efficient architectures for time-
interleaved oversampling Delta-Sigma converters,” IEEE Transactions on Circuits and
Systems-II: Analog and Digital Signal Processing, vol. 47, no. 8, pp. 802–810, August
2000.
[37] David A. Johns and Ken Martin, Analog Integrated Circuit Design. Toronto, Canada:
John Wiley & Sons, Inc, 1997.
[39] Jared Welz, and Ian Galton, “Necessary and sufficient conditions for mismatch shaping
in a general class of multibit DACs,” IEEE Transactions on Circuits and Systems-II:
Analog and Digital Signal Processing, vol. 49, no. 12, pp. 748–759, December 2002.
[41] D.A. Parker and K.K. Parhi, “Area-efficient parallel FIR digital filter implementations,”
in IEEE International Conference on Application-Specific Systems, Architectures and
Processors, pp. 93–111, August 1996.
[42] Bede Liu, “Effect of finite word length on the accuracy of digital filters - A Review,”
IEEE Transactions on Circuits Theory, vol. 18, no. 6, pp. 670–677, November 1971.
[43] Henry Samueli, “An improved search algorithm for the design of multiplierless FIR
filters with powers-of-two coefficients,” IEEE Transactions on Circuits and Systems,
vol. 36, no. 7, pp. 1044–1047, July 1989.
126 References
[44] Kyung-Ju Cho, Kwang-Chul Lee, Jin-Guyn Chung, and Keshab K. Parhi, “Design of
low-error fixed-width modified Booth multiplier,” IEEE Transactions on Very Large
Scale Integration Systems, vol. 12, no. 5, pp. 522–531, May 2004.
[45] Abdellatif Bellaouar, and Mohamed I. Elmasry, Low-Power Digital VLSI Design: Cir-
cuits and Systems. Norwell, Massachusetts, USA: Kluwer Academic Publishers, 2000.
[46] Behzad Razavi, Design of Analog CMOS Integrated Circuits. New York, NY, USA:
McGraw-Hill Companies, Inc, 2001.
[47] Tae-young Oh, Christoph Jungemann, and Robert W. Dutton, “Hydrodynamic sim-
ulation of RF noise in deep-submicron MOSFETs,” in International Conference on
Simulation of Semiconductor Processes and Devices, SISPAD, pp. 87–90, September
2003.
[49] Timothy O. Dickson, Rudy Beerkens, and Sorin P. Voinigescu, “A 2.5-V, 40-Gb/s deci-
sion circuit using SiGe BiCMOS logic,” in Proceedings of the IEEE, pp. 206 – 209, June
2004.