Infranet: Circumventing Web Censorship
and Surveillance
Nick Feamster, Magdalena Balazinska,
Greg Harfst, Hari Balakrishnan, David Karger
M.I.T. Laboratory for Computer Science
http://nms.lcs.mit.edu/infranet/
The Big Picture
htt
p://
ww Internet
w.c
nn.
com
CENSOR
http://nms
.lcs.mit.ed
u
nms.lcs.mit.edu cnn.com
The Big Picture
htt
p://
ww Internet
w.c
nn.
com
CENSOR
http://nms
.lcs.mit.ed
u
s . m i t . e d u"
p : / / n m s .lc
"ht t m ) nms.lcs.mit.edu cnn.com
. c o
w w . cnn
h t t p : //w
(
How Infranet Works
http://nms.lcs.mit.edu/ http://
http://nms.lcs.mit.edu/papers/ http://www.
http://nms.lcs.mit.edu/papers/usenix.html http://www.cnn.com
CENSOR
nms.lcs.mit.edu cnn.com
<html><title>cnn.com
</title> ...
Use Infranet requester proxy (on localhost)
Upstream request in sequence of HTTP requests
Downstream response in images
Censor
Restrictive government, corporate firewall, etc.
Discovery Attacks: Notice unusual-looking Web traffic.
monitors Web access for "inappropriate use"
watch Web traffic for inappropriate access attempts
watch for suspicious looking Web access patterns
watch for use of circumvention software
watch for suspicious inter-request times
Disruptive Attacks: Keep the endpoints from talking.
blocks access to certain Web sites
attempts to block access to circumvention software (e.g., blocking
SSL, disrupting communication, etc.)
Design Goals
Deniability for clients
Can’t confirm that a client is intentionally retrieving censored data
Statistical deniability for clients
Web traffic doesn’t look unusual
Covertness for servers
Can’t discover a server that is serving censored content
Defense against blocking
Communication Robustness
Should be difficult to disrupt request/transfer of censored content
Reasonable Performance
Related Systems: Triangle Boy, Peekabooty, etc.
Deniability for clients
Existing systems rely on SSL, vulnerable to fingerprinting
Statistical deniability for clients
SSL traffic looks suspicious
No attempt to conceal suspicious traffic patterns
Covertness for servers
Servers make no attempt to conceal their existence
Suspicious traffic patterns may result in discovery and blocking
Communication Robustness
SSL can be blocked (e.g., unsigned server certificates)
Downstream Communication ("Downloading")
Requester Censor Responder
http://nm
s.lcs.mit.e
du/people
.html
http://nms.lc
s.mit.edu/nic
k.jpg Embed
<html><title>cnn.
http://nms.lc
s.mit.edu/m
agda.jpg
Embed
.com</title>...
Embed data in images, recover by shared secret
Steganography is not ideal: can’t reuse cover image
Web cams are wonderful.
Upstream Communication ("Requesting")
Requester Censor Responder
http://n
ms.lcs.m
it.edu/
Decode
http:// http://
nms.lc
s.mit.e
du/pap
ers.htm
l
Decode
http:// http:// www.
nms.lc
s.mit.e
du/use
nix.htm
l Decode
http://www. cnn.com
Hidden message => sequence of HTTP requests
Mapping function: secret, critical to deniability
Simple Schemes: Covertness/Bandwidth
Odd/Even Links
Covertness: Requester may ask for any one of half of the links at
any given time
Bandwidth: 1-bit per visible HTTP request
Links modulo k
Covertness: Requester asks for any of N/k links
Bandwidth: lg(k) bits per visible HTTP request
Static Mapping
Covertness: potentially quite bad...
Bandwidth: M bits per request
Range-Mapping: Web Surfing, 20 Questions-Style
Assume: Some set of censored URLs are commonly requested
Responder tells requester
the boundaries (split-strings) for ranges in this set, and
the mapping between visible HTTP requests and split-strings
Requester tells responder
a visible HTTP request
Visible Requests Split−strings Visible Requests Split−strings
0% 25%
http://nms.lcs.mit.edu/projects.html http://www.amazon.com/ http://nms.lcs.mit.edu/nick.html http://www.cnn.com/
http://nms.lcs.mit.edu/people.html http://www.microsoft.com http://nms.lcs.mit.edu/magda.html http://www.ebay.com
http://nms.lcs.mit.edu/publications.html http://www.yahoo.com/ http://nms.lcs.mit.edu/greg.html http://www.java.com/
http://nms.lcs.mit.edu/history.html http://nms.lcs.mit.edu/hari.html http://www.microsoft.com/
100%
50%
but...not all requests are equally likely!
Getting Statistical Deniability
Divide the corpus according to more likely visible HTTP
requests.
Arithmetic coding says that our expected number of
requests is the same!
Visible Requests Split−strings
0%
http://nms.lcs.mit.edu/projects.html http://www.abc.com/
http://nms.lcs.mit.edu/people.html http://www.microsoft.com
http://nms.lcs.mit.edu/publications.html http://www.zdnet.com/
http://nms.lcs.mit.edu/history.html
100%
Range-Mapping
Search through set of frequently-requested censored
URLs achieves good upstream bandwidth.
Division of ranges according to conditional request
probabilities achieves deniability and covertness.
Idea can be applied over the space of all strings.
Statistical Deniability is Free
1
0.9
Fraction of all hidden URLs
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1 Without Deniability (mean: 5.9)
With Deniability (mean: 6.4)
0
0 1 2 3 4 5 6 7 8 9 10
Number of requests to transmit hidden URL
Server Covertness is Not Free
1
No Infranet
Infranet
Fraction of Requests
0.8
0.6
0.4
0.2
0
10 100 1000 10000
Bandwidth (kB/s)
Security Analysis
Discovery Attacks
Responders: discover and block access
Requesters: discover and...
Channel: reveal channel and its contents
Disruptive Attacks
Block access to responders
Break tunnel protocol
Remove or alter:
hidden content
hidden requests
Discovery Attacks
Intent: reveal requester or responder
Passive attacks to detect unusual
browsing patterns (including request timing)
HTTP requests/responses
Defense: Innocuous browsing patterns, no modification to HTTP
headers themselves
Impersonation attacks
censor establishes an Infranet responder and nails requesters
Defense: separate forwarding and decoding of symbols
Disruptive Attacks
Intent: disrupt communication between requester and
responder
Filtering (IP, DNS, port, etc.)
Defense: server covertness--never return the different content for
the same URL
Request Tampering
reorder/remove fields from HTTP headers
alter downstream content
Defense: don’t hide messages in HTTP headers, no cookies, use
hidden watermarks, etc.
Session Tampering
insert/remove/reorder HTTP requests and responses
Defense: error correction, headers to "echo" message to
responder, etc.
Conclusion
Infranet hides censored requests and responses in
innocuous-looking HTTP request/response streams
client deniability
server covertness
reasonable robustness
Future work
robustness
software distribution
server discovery
http://nms.lcs.mit.edu/infranet/
Communication Protocol Overview
Message exchange
Requests and responses are hidden in visible HTTP traffic
Requires a hiding function
Messages exchanged as a stream of symbols
Symbol construction
How symbols map to message fragments
A group of symbols is called an alphabet
Upstream: set of URLs on a responder’s Web site
Downstream: high-frequency components in images
Modulation
Specifies a particular function for symbol construction
Maps a sequence of visible HTTP requests to a hidden message
Hiding
Requires:
a message to hide,
a cover medium, and
a secret that reveals the hidden message
Without knowledge of secret, can’t reveal message (or
even prove that a message is being hidden)
Secret exchange...
Tunnel Protocol
Joint Infranet Infranet
Censor
FSM Requester Responder
HTTP Reque
st("/")
index.html)
HTTP Response(
Set User
ID
Tunnel Setup
Stream, ike y)
Initialize H down ( U init,HTTP Resp
Modulation
Function
Exchange Hup(EPKRes (skey)
p , HTTP R eq Stream, U )
Key init
TTP Resp Stre am, skey)
Update H down ( U tunnel, H
Modulation
Function
Transmit Hup(hidden req. ,
HTTP Req Stream
, Utunne )
Steady state
Request l
ey)
Resp Stream, sk
H down (content , HTTP
Transmit
Response
Requester learns upstream modulation function.
Responder learns secret key for downstream hiding.
Upstream Modulation
Infranet Infranet
Censor
Requester Responder
Up(msg frag 1) = url 1 HTTP Reque
st( url )
1
U -1p(url1) = msg frag 1
se
HTTP Respon
.
.
.
Up(msg frag n) = url n HTTP Reque
st( url )
n
U -1p(urln) = msg frag n
se
HTTP Respon
Hidden message => sequence of HTTP requests
Tunnel modulation function is secret
Choice of modulation function is important!
Range-Mapping
C
1
vmax
...
δ v2
δ * P(r2|rcurrent)
v1
δ * P(r1|rcurrent)
vmin
0
Lexicographically
ordered strings
stringmin s1 s2 ... stringmax
More Visible Links => Greater Upstream Bandwidth
20
90th percentile
50th percentile
15
Number of requests
10
0
1 2 4 8 16 32 64 128
Number of possible next requests at each page
Downstream Performance
60
55
50
Number of requests to retrieve data
45
40
35
30
25
20
15
10
5
0 10000 20000 30000 40000 50000 60000 70000 80000
Size of requested data (bytes)
Arithmetic Coding
F(x)
0 1
F’(x1)
0 1 0 1
x
{
x1
F(x)
Encode: Send F’(x)
Decode: Use decision tree (is F’(x) > P(0)?)
With a larger range in F(x), F’(x) requires fewer bits to
express the range.
Arithmetic Coding
F(x)
Codeword Length
First Request
F’(x1)
yahoo
Second Request
amazon cnn
{
x1
F(x)
More popular requests require fewer codewords
On average, number of requests will be the binary
entropy divided by the number of bits per request
"Skewing" how we divide the distribution: selecting
codewords in the distribution.
Steganography
Two stage process
Determine "redundant" bits
Select which of these bits to embed the encrypted message
(determined by PRG, initialized with seed)
Set state based on initial seed, embed a secondary seed
that selects a PRG that makes minimal modifications
Defend against statistical attacks that look at distribution
of DCT coefficients, etc.
RSA Public Key Cryptosystem
Sele
t p, q prime.
n = pq
(n) = (p
1) (q 1)
Sele
t an (e; d) su
h that ed 1mod n. Then:
C M e mod n
M C d mod n
M ed mod n
M 1+k(p 1)(q 1) mod n
M 1k(q 1) mod p (Euler)
M 1k(p 1) mod q
M mod n (CRT)