Timing Attacks on Implementations
of
                Diffie-Hellman, RSA, DSS, and Other Systems
Abstract
By carefully measuring the amount of time required to perform private key operations, attackers may be
able to find fixed Diffie-Hellman exponents, factor RSA keys, and break other cryptosystems.Against a
vulnerable system, the attack is computationally inexpensive and often requires only known ciphertext.
Actual systems are potentially at risk, including cryptographic tokens, network-based cryptosystems, and
other applications where attackers can make reasonably accurate timing measurements. Techniques for
preventing the attack for RSA and Diffie-Hellman are presented. Some cryptosystems will need to be re-
vised to protect against the attack, and new protocols and algorithms may need to incorporate measures to
prevent timing attacks. Keywords: timing attack, cryptanalysis, RSA, Diffie-Hellman, DSS.
Introduction
Cryptosystems often take slightly different amounts of time to process different inputs. Reasons include
performance optimizations to bypass unnecessary operations, branching and conditional statements, RAM
cache hits, processor in-structions (such as multiplication and division) that run in non-fixed time, and a
wide variety of other causes. Performance characteristics typically depend on both the encryption key and
the input data (e.g., plaintext or ciphertext). While it is known that timing channels can leak data or keys
across a controlled perime-ter, intuition might suggest that unintentional timing characteristics would only
reveal a small amount of information from a cryptosystem (such as the Hamming weight of the key).
However, attacks are presented which can exploit timing measurements from vulnerable systems to find the
entire secret key.
Cryptanalysis of a Simple Modular Exponentiator
Diffe-Hellman and RSA private-key operations consist of computing R = yx mod n, where n is public and y
can be found by an eavesdropper. The at-tacker’s goal is to find x, the secret key. For the attack, the victim
must com-pute yx mod n for several values of y, where y, n, and the computation time are known to the
attacker. (If a new secret exponent x is chosen for each operation, the attack does not work.) The necessary
information and timing measurements might be obtained by passively eavesdropping on an interactive
protocol, since an attacker could record the messages received by the target and measure the amount of
time taken to respond to each y. The attack assumes that the attacker knows the design of the target system,
although in practice this could probably be inferred from timing information.
The attack can be tailored to work with virtually any implementation that does not run in fixed time, but is
first outlined using the simple modular exponentiation algorithm below which computes R = yx mod n,
where x is w bits long:
Let s0 = 1.
For k = 0 upto w _ 1:
If (bit k of x) is 1 then
Let Rk = (sk _ y) mod n.
Else
Let Rk = sk.
Let sk+1 = R2
k mod n.
EndFor.
Return (Rw_1).
The attack allows someone who knows exponent bits 0::(b-1) to find bit b. To obtain the entire exponent,
start with b equal to 0 and repeat the attack until the entire exponent is known.
Because the first b exponent bits are known, the attacker can compute the first b iterations of the For loop
to find the value of sb. The next iteration requires the first unknown exponent bit. If this bit is set, Rb = (sb
_ y) mod n will be computed. If it is zero, the operation will be skipped.
The attack will be described first in an extreme hypothetical case. Suppose the target system uses a modular
multiplication function that is normally extremely fast but occasionally takes much more time than an
entire normal modular exponentiation. For a few sb and y values the calculation ofRb = (sb _ y) mod n will
be extremely slow, and by using knowledge about thetarget system’s design the attacker can determine
which these are. If the totalmodular exponentiation time is ever fast when Rb = (sb _y) mod n is slow,
exponent bit b must be zero. Conversely, if slow Rb = (sb _y) mod n operations always
result in slow total modular exponentiation times, the exponent bit is probably
set. Once exponent bit b is known, the attacker can verify that the overall oper-
ation time is slow whenever sb+1 = R2b mod n is expected to be slow. The same
set of timing measurements can then be reused to find the following exponent bits.
Error Correction
If exponent bit b is guessed incorrectly, the values computed for Rk_b will be incorrect and, so far as the
attack is concerned, essentially random. The time 2 required for multiplies following the error will not be
reected in the overall exponentiation time. The attack thus has an error-detection property; after an incorrect
exponent bit guess, no more meaningful correlations are observed. The error detection property can be
used for error correction. For example, the attacker can maintain a list of the most likely exponent
intermediates along with a value corresponding to the probability each is correct. The attack is continued
for only the most likely candidate. If the currently-favored value is incorrect, it will tend to fall in ranking,
while correct values will tend to rise. Error correction techniques increase the memory and processing
requirements for the attack, but can greatly reduce the number of samples required.
4 The General Attack
The attack can be treated as a signal detection problem. The \signal” consists of the timing variation due to
the target exponent bit, and \noise” results from measurement inaccuracies and timing variations due to
unknown exponent bits. The properties of the signal and noise determine the number of timing measure-
ments required to for the attack.
Given j messages y0; y1; :::; yj_1 with corresponding timing measurements
T0; T1; :::; Tj_1, the probability that a guess xb for the first b exponent bits is
correct is proportional to
P(xb) /
j_1
Yi=0
F(Ti _ t(yi; xb))
where t(yi; xb) is the amount of time required for the first b iterations of the yx i mod n computation using
exponent bits xb, and F is the expected probability distribution function of T _ t(y; xb) over all y values and
correct xb. Because F is de_ned as the probability distribution of Ti _t(yi; xb) if xb is correct, it is the best
function for predicting Ti _ t(yi; xb). Note that the timing measurements and intermediate s values can be
used improve the estimate of F.
Given a correct guess for xb_1, there are two possible values for xb. The
probability that xb is correct and x0
b is incorrect can be found as
Qj_1
i=0 F(Ti _ t(yi; xb))
Qj_1
i=0 F(Ti _ t(yi; xb)) +Qj_1
i=0 F(Ti _ t(yi; x0
b))
:
In practice, this formula is not very useful because finding F would require extraordinary e_ort.
5 Simplifying the Attack
Fortunately it is generally not necessary to compute F. Each timing observation
consists of T = e+Pw_1
i=0 ti, where ti is the time required for the multiplication
and squaring steps for bit i and e includes measurement error, loop overhead,
3
etc. Given guess xb, the attacker can find Pb_1
i=0 ti for each sample y. If xb is
correct, subtracting from T yields e +Pw_1
i=0 ti _Pb_1
i=0 ti = e +Pw_1
i=b ti. Since
the modular multiplication times are e_ectively independent from each other and from the measurement
error, the variance of e +Pw_1 i=b ti over all observed samples is expected to be Var(e) + (w _b)Var(t).
However if only the first c < b bits of the exponent guess are correct, the expected variance will be Var(e) +
(w_b+2c)Var(t). Correctly-emulated iterations decrease the expected variance by Var(t), while iterations
following an incorrect exponent bit each increase the variance by Var(t). Computing the variances is easy
and provides a good way to identify correct exponent bit guesses.
It is now possible to estimate the number of samples required for the attack. Suppose an attacker has j
accurate timing measurements and has two guesses for the first b bits of a w-bit exponent, one correct and
the other incorrect with the first error at bit c. For each guess the timing measurements can be adjusted by
Pb_1 i=0 ti. The correct guess will be identi_ed successfully if its adjusted values have the smaller
variance.
It is possible to approximate ti using independent standard normal variables.
If Var(e) is negligible, the expected probability of a correct guess is
P j_1
Xi=0 _pw _ bXi +p2(b _ c)Yi_2
>
j_1
Xi=0 _pw _ bXi_2!
= P 2p2(b _ c)(w _ b)
j_1
Xi=0
XiYi + 2(b _ c)
j_1
Xi=0
Y2
i > 0!
where X and Y are normal random variables with _ = 0 and _ = 1. Because j
is relatively large, Pj_1
i=0 Y 2
i _ j and Pj_1
i=0 XiYi is approximately normal with
_ = 0 and _ = pj, yielding
P _2p2(b _ c)(w _ b)(pjZ) + 2(b _ c)j > 0_ = P Z > _ pj(b _ c) p2(w _ b)! where Z is a standard normal
random variable. Finally, integrating to find the probability of a correct guess yields _ _q j(b_c)
2(w_b)_, where _(x) is the area under the standard normal curve from_1 to x. The required number of
samples (j) is thus proportional to the exponent size (w). The number of measurements might be reduced if
attackers choose inputs known to have extreme timing character-istics at exponent locations of interest.
6 Experimental Results
Figure 1 shows the distribution of 106 modular multiplication times observed using the RSAREF
toolkit[10] on a 120-MHz PentiumTM computer running MSDOSTM. The distribution was prepared by
timing one million (a _ b mod n) calculations using a and b values from actual modular exponentiation
operations 4 with random inputs. The 512-bit sample prime #1 from the RSAREF DiffieHellman
demonstration program was used for n. A few wildly aberrant samples (which took over 1300_s) were
discarded. The Figure 1 distribution has mean _ = 1167.8_s and standard deviation _ = 12:01_s. The
measurement error is small; the tests were run twice and the average measurement diffrence was found to
be under 1_s. RSAREF uses the same function for squaring and multiplication, so squaring and
multiplication times have identical distributions. RSAREF precomputes y2 and y3 mod n and processes
two exponent bits at a time. In total, a 512-bit modular exponentiation with a random 256-bit exponent
requires 128 iterations of the modular exponentiation loop and a total of about 352 modular multiplication
and squaring operations. Each iteration of the modular exponentiation loop does two squaring operations
and, if either exponent bit is nonzero, one multiply. The attack can be adjusted to append pairs of exponent
bits and to evaluate four candidate values at each exponent position instead of two.
Since modular multiplications consume most of the total modular exponen-tiation time, it is expected that
the distribution of modular exponentiation times will be approximately normal with _ _ (1167:8)(352) =
411; 065:6_s and _ _ 12:01p352 = 225:3_s. Figure 2 shows measurements from 5000 actual mod-ular
exponentiation operations using the same computer and modulus, which yielded _ = 419; 901_s and _ =
235_s.
With 250 timing measurements, the probability that subtracting the time for a correct modexp loop iteration
from each sample will reduce the total variance more than subtracting an incorrect iteration is estimated to
be _ _q j(b_c) 2(w_b)_, where j = 250, b = 1, c = 0, and w = 127. (There are 128 iterations of the RSAREF
modexp loop for a 256-bit exponent, but the first iteration is ignored.) Correct guesses are thus expected
with probability _ _q250(1_0)
2(126) _ _ 0:84. The
5
5000 samples from Figure 2 were divided into 20 groups of 250 samples each, and variances from
subtracting the time for incorrect and correct modexp loop iterations were compared at each of the 127
exponent bit pairs. Of the 2450 trials, 2168 produced a larger variance after subtracting an incorrect
modexp loop time than after subtracting the time for a correct modexp loop, yielding a probability of 0.885.
The first exponent bits are most di_cult, since b becomes larger as more exponent bits become known and
the probabilities should improve. (The test above did not take advantage of this property.) It is important
to note that accurate timing measurements were used; measurement errors which are large relative to the
total modular exponentiation time standard deviation will increase the number of samples needed.
The attack is computationally quite easy. With RSAREF, the attacker has to evaluate four choices per pair
of bits. Thus the attacker only has to do four times the number of operations done by the victim, not
counting e_ort wasted by incorrect guesses.
7 Montgomery Multiplication and the CRT
Modular reduction steps usually cause most of the timing variation in a modu-lar multiplication operation.
Montgomery multiplication[6] eliminates the mod n reduction steps and, as a result, tends to reduce the size
of the timing character-istics. However, some variation usually remains. If the remaining \signal” is not
dwarfed by measurement errors, the variance in tb and the variance of Pw_1 i=b+1 ti would be reduced
proportionally and the attack would still work. However if the measurement error e is large, the required
number of samples will increase in proportion to 1 Var(ti)
.
The Chinese Remainder Theorem (CRT) is also often used to optimize RSA private key operations. With
CRT, (y mod p) and (y mod q) are computed first, where y is the message. These initial modular reduction
steps can be vulnerable to timing attacks. The simplest such attack is to choose values of y that are close to
p or q, then use timing measurements to determine whether the guessed value is larger or smaller than the
actual value of p (or q). If y is less than p, computing y mod p has no e_ect, while if y is larger than p, it is
necessary to subtract p from y at least once. Also, if the message is very slightly larger than p, y mod p will
have leading zero digits, which may reduce the amount of time required for the first multiplication step.
The speci_c timing timing attacks can be adapted to directly attack the mod p and mod q
modular exponentiations performed with the Chinese characteristics depend on the
implementation. RSAREF’s modular reduction function with a 512-bit modulus the Pentium computer with
y chosen randomly between 0 and 2p takes an average of 42.1_s if y < p, as opposed to 73.9_s if y > p.
Timing measurements from many y could be combined to successively approximate p. In some cases it
may be possible to improve the Chinese Remainder Theorem RSA attack to use known (not chosen)
ciphertexts, reducing the number of mes-sages required and making it possible to attack RSA digital
signatures. Modular reduction is done by subtracting multiples of the modulus, and exploitable timing
variations can be caused by variations in the number of compare-and-subtract 6 steps. For example,
RSAREF’s division loop integer-divides the uppermost two digits of y by one more than the upper digit of
p, multiplies p by the quotient, shifts left the appropriate number of digits, then subtracts the result from y.
If the result is larger than p (shifted left), a extra subtraction is performed. The decision whether to perform
an extra subtraction step in the first loop of the division algorithm usually depends only on y (which is
known) and the upper two digits of p. A timing attack could be used to determine the upper digits of p. For
example, an exhaustive search over all possible values for the upper two digits of p (or more e_cient
techniques) could identify value for which the observed times correlate most closely with the expected
number of subtraction operations. As with the DiffieHellman/non-CRT attack, once one digit of p has been
found, the timing measurements could be reused to find subsequent digits. It is not yet known whether
Remainder Theorem.
8 Timing Cryptanalysis of DSS
loop to attack new bits of x. If k_1 is precomputated, DSS signatures require just two The
Digital Signature Standard[5] computes s = (k_1(H(m) + x _ r)) mod q, where r and q are known to
attackers, k_1 is usually precomputed, H(m) is the hash of the message, and x is the private key. In practice,
(H(m) +x _ r) mod q would normally be computed first, then is multiplied by k_1 (mod q). If the modular
reduction function runs in non-_xed time, the overall signa-ture time should be correlated with the time for
the (x _ r mod q) computation. The attacker can calculate and compensate for the time required to compute
H(m). Since H(m) is of approximately the same size as q, its addition has little e_ect on the reduction time.
The most signi_cant bits of x _ r are typically the first used in the modular reduction. These depend on r,
which is known, and the most signi_cant bits of the secret value x. There would thus be a correlation
between values of the upper bits of x and the total time for the modular reduction. By looking for the
strongest probabilities over the samples, the at-tacker would try to identify the upper bits of x. As more
upper bits of x become known, more of x _ r becomes known, allowing the attacker to proceed through
more iterations of the modular reduction modular multiplication opera-tions, potentially making the amount
of additional timing noise which must be _ltered out relatively small.
9 Masking Timing Characteristics
The most obvious way to prevent timing attacks is to make all operations take exactly the same amount of
time. Unfortunately this is often di_cult. Making software run in _xed time, especially in a platform-
independent manner, is hard because compiler optimizations, RAM cache hits, instruction timings, and
other 7 factors can introduce unexpected timing variations. If a timer is used to delay returning results until
a pre-speci_ed time, factors such as the system respon-siveness or power consumption may still change
detectably when the operation _nishes. Some operating systems also reveal processes’ CPU usage. Fixed
time implementations are also likely to be slow; many performance optimizations cannot be used since all
operations must take as long as the slowest operation. (Note: Always performing the optional Ri = (si _ y)
mod n step does not make an implementation run in constant time, since timing characteristics from the
squaring operation and subsequent loop iterations can be exploited.)
Another approach is to make timing measurements so inaccurate that the
attack becomes unfeasible. Random delays added to the processing time do in-
crease the number of ciphertexts required, but attackers can compensate by col-
lecting more measurements. The number of samples required increases roughly
as the square of the timing noise. For example, if a modular exponentiator whose
timing characteristics have a standard deviation of 10 ms can be broken success-
fully with 1000 timing measurements, adding a random normally distributed
delay with 1 second standard deviation will make the attack require approxi-
mately _1000ms
10ms _2
(1000) = 107 samples. (Note: The mean delay would have to be several seconds to get a standard deviation
of 1 second.) While 107 samples is probably more than most attackers can gather, a security factor of 107 is
not usually considered adequate.
10 Preventing the Attack
Fortunately there is a better solution. Techniques used for blinding signatures[1]
can be adapted to prevent attackers from knowing the input to the modular ex-
ponentiation function. Before computing the modular exponentiation operation,
choose a random pair (vi; vf ) such that v_1
f = vi
x mod n. For DiffieHellman,
it is simplest to choose a random vi then compute vf = (v_1 i )x mod n. For RSA it is faster to choose a
random vf relatively prime to n then compute vi = (v_1 f )e mod n, where e is the public exponent. Before
the modular expo-nentiation operation, the input message should be multiplied by vi (mod n), and
afterward the result is corrected by multiplying with vf (mod n). The system should reject messages equal
to 0 (mod n).
Computing inverses mod n is slow, so it is often not practical to generate a
new random (vi; vf ) pair for each new exponentiation. The vf = (v_1
i )x mod n
calculation itself might even be subject to timing attacks. However (vi; vf ) pairs
should not be reused, since they themselves might be compromised by timing
attacks, leaving the secret exponent vulnerable. An e_cient solution to this
problem is update vi and vf before each modular exponentiation step by com-
puting v0
i = v2
i and v0
f = v2
f . The total performance cost is small (2 modular squarings, which can be precomputed, plus 2 modular
multiplications). More sophisticated update operations using exponents other than 2, multiplication with
other (vi; vf ) pairs, etc. can also be used, but do not appear to o_er any advantages.
8
If (vi; vf ) is secret, attackers have no useful knowledge about the input to the modular exponentiator.
Consequently the most an attacker can learn is the general timing distribution for exponentiation
operations. In practice, distribu-tions are close to normal and the 2w exponents cannot possibly be
distinguished. However, a maliciously-designed modular exponentiator could theoretically have a
distribution with sharp spikes corresponding to exponent bits, so blinding does not provably prevent timing
attacks.
Even with blinding, the distribution will reveal the average time per op-eration, which can be used to infer
the Hamming weight of the exponent. If anonymity is important or if further masking is required, a random
multiple of ‘(n) can be added to the exponent before each modular exponentiation. If this is done, care must
be taken to ensure that the addition process itself does not have timing characteristics which reveal ‘(n).
This technique may be helpful in pre-venting attacks that gain information leaked during the modular
exponentiation operation due to electromagnetic radiation, system performance uctuations, changes in
power consumption, etc. since the exponent bits change with each operation.
11 Further Work
Timing attacks can potentially be used against other cryptosystems, includ-
ing symmetric functions. For example, in software the 28-bit C and D values
in the DES[4] key schedule are often rotated using a conditional which tests
whether a one-bit must be wrapped around. The additional time required to
move nonzero bits could slightly degrade the cipher’s throughput or key setup
time. The cipher’s performance can thus reveal the Hamming weight of the key,
which provides an average of P56
n=0 _56
n _ 256 log2 _ 2
56
_56
n __ _ 3:95 bits of key infor-
mation. IDEA[3] uses an f() function with a modulo (216 + 1) multiplication operation, which will usually
run in non-constant time. RC5[7] is at risk on platforms where rotates run in non-constant time. RAM
cache hits can produce timing characteristics in implementations of Blow_sh[11], SEAL[9], DES, and other
ciphers if tables in memory are not used identically in every encryption. Additional research is needed to
determine whether speci_c implementations are at risk and, if so, the degree of their vulnerability. So far,
only a few speci_c systems have been studied in detail and the attacks against CRT/Montgomery RSA and
DSS are currently theoretical.
Further re_nements to the attack may also be possible. A direct attack against p and q in RSA with the
Chinese Remainder Theorem would be partic-ularly important.
12 Conclusions
In general, any channel which can carry information from a secure area to the outside should be studied as
a potential risk. Implementation-speci_c timing 9 characteristics provide one such channel and can
sometimes be used to com-promise secret keys. Vulnerable algorithms, protocols, and systems need to be
revised to incorporate measures to resist timing cryptanalysis and related at-tacks.
References
1.   D. Chaum, \Blind Signatures for Untraceable Payments,” Advances in Cryptology:
     Proceedings of Crypto 82, Plenum Press, 1983, pp. 199-203.
2.   W. Diff and M.E. Hellman, \New Directions in Cryptography,” IEEE Transac-tions on Information
     Theory, IT-22, n. 6, Nov 1976, pp. 644-654.
3.   X. Lai, On the Design and Security of Block Ciphers, ETH Series in Information Processing, v. 1,
     Konstanz: Hartung-Gorre Verlag, 1992.
4.   National Bureau of Standards, \Data Encryption Standard,” Federal Information