BLOCKCHAIN
The foundation behind Bitcoin
Sourav Sen Gupta
Indian Statistical Institute, Kolkata
CRYPTOGRAPHY
Backbone of Blockchain Technology
Component 1 : Cryptographic Hash Functions
HASH FUNCTIONS
Map variable-length input to constant-length output.
101011101011001…0010110100101 x h y 101110101001000110111100010101
HASH FUNCTIONS
Finding the pre-image of a given output is not easy.
101011101011001…0010110100101 ? h y 101110101001000110111100010101
HASH FUNCTIONS
Finding a colliding twin of a given input is not easy.
101011101011001…0010110100101 x1
h y 101110101001000110111100010101
1100101001011001…110010100110 x2
HASH FUNCTIONS
Finding any colliding pair of inputs is not easy.
101011101011001…0010110100101 x1
h y 101110101001000110111100010101
1100101001011001…110010100110 x2
It is of course possible, but not easy.
HASH FUNCTIONS
Minor input-mismatch to major output-mismatch.
101011101011001…0010110100101 x1 y1 101110101001000110111100010101
h
101010101011001…0010110100101 x2 y2 110010100101100100110010100110
CONSTRUCTIONS
m1 m2 mn
IV f f f h
Merkle-Damgard Construction
Example : SHA 256 — used in Bitcoin
CONSTRUCTIONS
m1 m2 mn h1
f f f f
c
Sponge Construction
Example : SHA 3 — used in Ethereum
APPLICATIONS
r x h y
commit(x) : verify(c,r,x) :
c = h(r || x) h(r || x) == c
Provably secure scheme for Commitment
Random nonce r must have a high min-entropy for this scheme to be secure.
APPLICATIONS
x h y
record(x) : verify(c,x) :
c = h(x) h(x) == c
Provably secure scheme for tamper-detection
DATA STRUCTURES
addr(data)
data
h hash(data)
Hash Pointer
Tamper-evident data pointer = Hash Pointer
DATA STRUCTURES
data data
HP(block) h HP(block)
timestamp timestamp
Block Block
Tamper-evident linked data structure = Block
DATA STRUCTURES
data data data data data
HP(block) HP(block) HP(block) HP(block) HP(block)
timestamp timestamp timestamp timestamp timestamp
Block Block Block Block Block
Tamper-evident linked-list = Blockchain
DATA STRUCTURES
data data data data data
HP(block) HP(block) HP(block) HP(block) HP(block)
timestamp timestamp timestamp timestamp timestamp
Block Block Block Block Block
data data data data data
HP(block) HP(block) HP(block) HP(block) HP(block)
timestamp timestamp timestamp timestamp timestamp
Block Block Block Block Block
Tamper-evident linked-list = Blockchain
DATA STRUCTURES
HP(root) data
HP(left) HP(right)
timestamp
Node
data data
HP(left) HP(right) HP(left) HP(right)
timestamp timestamp
Node Node
data data
HP(left) HP(right) HP(left) HP(right)
timestamp timestamp
Node Node
Tamper-evident binary-tree = Merkle Tree
DATA STRUCTURES
HP(root) data
HP(left) HP(right)
timestamp
Node
data data
HP(left) HP(right) HP(left) HP(right)
timestamp timestamp
Node Node
data data
HP(left) HP(right) HP(left) HP(right)
timestamp timestamp
Node Node
Tamper-evident binary-tree = Merkle Tree
DATA STRUCTURES
Properties Blockchain Merkle Tree Merkle Trie
Size of Commitment O(1) O(1) O(1)
Append a Block/Node O(1) O(log n) O(k)
Update a Block/Node O(n) O(log n) O(k)
Proof of Membership O(n) O(log n) O(k)
Structural Abstraction List of Objects Set of Objects Set of (key, value)
Used for Construction Bitcoin Bitcoin Ethereum
QUESTIONS
Can any pointer-based data structure
be efficiently converted into a
Hash-Pointer based data structure?
Will such an exercise be at all useful in any use case?
Do these structures provide any additional advantage?
Component 2 : Digital Signature Schemes
DIGITAL SIGNATURE
s = sign(sk,m) sk keygen(n) pk verify(pk,m,s)
2 1 3
Digital signature as a set of three algorithms
DIGITAL SIGNATURE
s = sign(sk,m) sk keygen(n) pk verify(pk,m,s)
(sk, pk) = keygen(n) verify(pk,m,sign(sk,m)) = True
DIGITAL SIGNATURE
s = sign(sk,m) sk keygen(n) pk verify(pk,m,s)
Given pk and access to sign(mi) as an oracle, an adversary should
not be able to create a valid fresh message-signature pair (m,s)
CONSTRUCTION
Q Fp
Elliptic Curve Digital Signature Algorithm (ECDSA)
ECDSA on curve E(Fp) : { (x,y) in Fp x Fp | y2 = x3 + 7 }
with base prime p = 2256 - 232 - 29 - 28 - 27 - 26 - 24 - 1
CONSTRUCTION
Elliptic Curve group of size |E(Fp)| = q ~ p ~ 2256
Parameters Format Range Bit-size
sk random Zq 256
pk sk x G E(Fp) 512
m hash(M) Zq 256
Signature (r, s) Zq x Zq 512
ECDSA on curve E(Fp) : { (x,y) in Fp x Fp | y2 = x3 + 7 }
with base prime p = 2256 - 232 - 29 - 28 - 27 - 26 - 24 - 1
APPLICATION
pk
sk
sk
? sk
verify(pk,m,sign(sk,m))
Publish the public key pk as your Identity
Use the secret key sk to prove your identity
BITCOIN
Blockchain in Practice
BITCOIN
Ledger of Transactions
between
Pseudonymous Identities
Semi-Decentralised Publicly-Verifiable
Tamper-Resistant Eventually-Consistent
NOT BITCOIN
Economic Transaction
that we are familiar with
Tx
NOT BITCOIN
Tx
Centralised Account-based Ledger
NOT BITCOIN
Tx
Decentralised Account-based Ledger
NOT BITCOIN YET
Tx Tx Tx Tx Tx Tx Tx
Tx
Decentralised Transaction-based Ledger
TRANSACTION
Tx Tx
Signed by
Network verifies the Signature
TRANSACTION
pk
Tx Tx pk
Signed by sk
Network verifies the Signature
TRANSACTION
Input : Array of previous Transactions | Output : Array of recipient Addresses
pk1 pk
Tx R1
Recipient(s)
Sender(s)
pk2 pk
Tx Tx R2
sk1 sk2 sk3
pk3 pk
Tx R3
Network verifies the Signature(s)
TRANSACTION
Input : Array of previous Transactions | Output : Array of recipient Addresses
pk1
Tx Tx
Input Transactions
pk2 pk pk pk
Tx R1 R2 R3 Recipients
pk3
Tx sk1 sk2 sk3 Signatures
Network verifies the Signature(s)
Metadata
TRANSACTION
Input(s)
Output(s)
Data obtained from blockchain.info
LEDGER
Tx Tx Tx Tx Tx Tx Tx Tx
Tx Tx Tx Tx Tx Tx Tx Tx Tx Tx Tx Tx
Decentralised Transaction-based Ledger
BLOCK
Data obtained from blockchain.info
BLOCK
Data obtained from blockchain.info
BLOCK
Data obtained from blockchain.info
BLOCK
Data obtained from blockchain.info
BLOCK
Data obtained from blockchain.info
BITCOIN
Tx Tx
Tx
Tx
Transaction
Mining
MINING
Tx Tx
Tx
Tx
Transaction
Computational
Lottery (Puzzle)
Find r such that
hash(r || m) < C
Existing blocks Winner writes
at a given time the next block
MINING
Data obtained from blockchain.info
MINING
Data obtained from blockchain.info
MINING
Data obtained from blockchain.info
MINING
Data obtained from blockchain.info
MINING
Data obtained from blockchain.info
BITCOIN
Tx Tx
Tx
Tx
Transaction
Mining
BITCOIN
Framework — Decentralised peer-to-peer collaborative network
Goal : All peers should agree on a sequence of transactions
BITCOIN
Publicly-Verifiable
as the complete ledger and the hash function is public
BITCOIN
Tamper-Evident / Tamper-Resistant
as the ledger is connected through a chain of hash pointers
X X X
X X X
X
BITCOIN
Eventually-Consistent
as the longest chain eventually sustains as the main chain
BITCOIN
Data obtained from blockchain.info
BITCOIN
Data obtained from blockchain.info
BITCOIN
Data obtained from blockchain.info
BITCOIN
Semi-Decentralised
as the mining is dominated by computational power
BITCOIN
Data obtained from blockchain.info
BITCOIN
Data obtained from blockchain.info
Robin Yao (BW), Wang Chun (F2Pool), Marshall Long (FinalHash), Pan Zhibiao (Bitmain)
Liu Xiang Fu (Avalon), Sam Cole (KnCMiner) and Alex Petrov (BitFury)
BITCOIN
Semi-Decentralised Publicly-Verifiable
Tamper-Resistant Eventually-Consistent
ECONOMICS
The success story of Bitcoin
BITCOIN
Data obtained from blockchain.info
BITCOIN
Data obtained from blockchain.info
BITCOIN
Data obtained from blockchain.info
BITCOIN
Data obtained from blockchain.info
BITCOIN
Data obtained from blockchain.info
BITCOIN
Data obtained from blockchain.info
SECURITY
The threat from Bitcoin
BITCOIN
Transactions : Completely transparent and public
Identities : Opaque and pseudonymous addresses
~ 170 Million bitcoin addresses
~ 150 Million bitcoin transactions
~ 80 GB of compressed raw data
~ 80% of transactions have < 2 inputs
~ 90% of transactions have < 3 outputs
BITCOIN
BITCOIN
Identities : Opaque and pseudonymous addresses
Anyone can create arbitrarily many identities
All identities “look” the same on the network
~ 170 Million bitcoin addresses
~ 150 Million bitcoin transactions
Provides “anonymity” of Bitcoin transactions.
BITCOIN
Data obtained from blockchain.info
BITCOIN
Data obtained from blockchain.info
BITCOIN
Data obtained from blockchain.info
Dark Marketplaces to buy-and-sell Drugs
Dark Marketplaces to buy-and-sell Guns and Fake ID
BITCOIN
Identities : Opaque and pseudonymous addresses
Anyone can create arbitrarily many identities
All identities “look” the same on the network
~ 170 Million bitcoin addresses
~ 150 Million bitcoin transactions
Is it still possible to trace transactions and identities?
DE-ANONYMIZATION
Potential solution to the threat from Anonymity
TRANSACTION
S1 R1
S2 R2
Tx
Sn Rm
EXAMPLE #1
1FLa9NcXJPA2XvF34LRuB4zbXX4Ws32dpL S1 Tx R1 18rdKmjrg1EawxgiVT3ikLExj6GWS2MNCk
Note : Single recipient with an exact match of input to output — highly unlikely.
EXAMPLE #2
R1 1H3bY2Cv1pmn8ffTdyeRvZAUjNJC1giQHm
1Ao6mKMEXxCVNVAuGjfLXZ3Zf43hd3yAEq S1 Tx
R2 16pDB5bvoqRGvoH32GaJLfsEcaMc2T9xDr
Note : Nice complete denomination along with a random change.
EXAMPLE #3
R1 19onWuLmjXGVfc7oUAEVuy9Yd3jxqhsUbK
1PXzMrz8KBNEkTt3Wnuqy4axiWszbyQKyE S1 Tx
R2 1AASWBCGveXH6H5yTCZW2x7uZrawDiqp4U
Note : 0.01121504 BTC = 6.50 USD at the time of transaction.
EXAMPLE #4
19SZcQ2CzJacQZE9rYwQjsfcBKMWDNwBWD S1
Tx R1 1PLjv1VzGEKxtM2FnRzg2FmDjen9trUBrh
13Zjnzx8VxtLUEiYcrVXKp5sLucLMvBqaG S2
Note : Two arbitrary inputs exactly match up to a desired output — highly unlikely.
EXAMPLE #5
6.13 USD 4.10 USD
1Djvb34FNpNXtrbbjaQeERZf68cyUdWyzd S1 R1 1AffmSG4tcNRjcgTWTnS6TM3cWPeeA9EVd
Tx
17atn5sagYRBUvzgFLd9bUjWF4yStkdokW S2 R2 1Nq612zwhEZDBNz2AeWKZxD6LvwiLm6cQU
6.03 USD 7.95 USD
Note : Two input transactions coupled for a payment plus some random change.
CLUSTERING
1PXzMrz8KBNEkTt3Wnuqy4axiWszbyQKyE
19onWuLmjXGVfc7oUAEVuy9Yd3jxqhsUbK
1Djvb34FNpNXtrbbjaQeERZf68cyUdWyzd
1AASWBCGveXH6H5yTCZW2x7uZrawDiqp4U
1FLa9NcXJPA2XvF34LRuB4zbXX4Ws32dpL
1AffmSG4tcNRjcgTWTnS6TM3cWPeeA9EVd
17atn5sagYRBUvzgFLd9bUjWF4yStkdokW
18rdKmjrg1EawxgiVT3ikLExj6GWS2MNCk
1Ao6mKMEXxCVNVAuGjfLXZ3Zf43hd3yAEq 16pDB5bvoqRGvoH32GaJLfsEcaMc2T9xDr
19SZcQ2CzJacQZE9rYwQjsfcBKMWDNwBWD 1H3bY2Cv1pmn8ffTdyeRvZAUjNJC1giQHm
13Zjnzx8VxtLUEiYcrVXKp5sLucLMvBqaG 1PLjv1VzGEKxtM2FnRzg2FmDjen9trUBrh
1Nq612zwhEZDBNz2AeWKZxD6LvwiLm6cQU
IDENTIFICATION
1PXzMrz8KBNEkTt3Wnuqy4axiWszbyQKyE
19onWuLmjXGVfc7oUAEVuy9Yd3jxqhsUbK
1Djvb34FNpNXtrbbjaQeERZf68cyUdWyzd
1AASWBCGveXH6H5yTCZW2x7uZrawDiqp4U
1FLa9NcXJPA2XvF34LRuB4zbXX4Ws32dpL
1AffmSG4tcNRjcgTWTnS6TM3cWPeeA9EVd
17atn5sagYRBUvzgFLd9bUjWF4yStkdokW
18rdKmjrg1EawxgiVT3ikLExj6GWS2MNCk
1Ao6mKMEXxCVNVAuGjfLXZ3Zf43hd3yAEq 16pDB5bvoqRGvoH32GaJLfsEcaMc2T9xDr
19SZcQ2CzJacQZE9rYwQjsfcBKMWDNwBWD 1H3bY2Cv1pmn8ffTdyeRvZAUjNJC1giQHm
13Zjnzx8VxtLUEiYcrVXKp5sLucLMvBqaG 1PLjv1VzGEKxtM2FnRzg2FmDjen9trUBrh
1Nq612zwhEZDBNz2AeWKZxD6LvwiLm6cQU
CLUSTERING
The Unreasonable Effectiveness of Address Clustering — Harrigan and Fretter, May 2016
DE-ANONYMIZATION
Passive : Analytics on 80 GB of Bitcoin blockchain data
— Clustering of Bitcoin Addresses with suitable definition of Metrics
— Identification of the Clusters using known and/or leaked Addresses
Active : Injecting and tracking marked Bitcoin transactions
— Registering on Dark Marketplaces, Exchanges, and Mining Pools
— Using Addresses leaked from all these sources for Identification
Elliptic (https://www.elliptic.co/) does something similar in the UK.
We should try to build our own tool for de-anonymization.
BLOCKCHAIN
Versatile Toolkit for Protocols
TRANSACTION
Input : Array of previous Transactions | Output : Array of recipient Addresses
pk1
Tx Tx
Input Transactions
pk2 pk pk pk
Tx R1 R2 R3 Recipients
pk3
Tx sk1 sk2 sk3 Signatures
Network verifies the Signature(s)
Metadata
TRANSACTION
Input(s)
Output(s)
Data obtained from blockchain.info
BITCOIN SCRIPT
Data obtained from blockchain.info
POTENTIAL
With a powerful Scripting Language
Developing “Smart Contracts” on Blockchain
Smart Contracts BitShares Ripple
BitGold Proof of Space ZeroCoin
Zcash ADePT RSCoin
Smart Properties
Proof of Existence OpenBazaar
Bitcoin-NG
OneName BigchainDB
Factom Ethereum
BitHealth
Retricoin Namecoin Proof of Commitment
SpaceMint BitNation Perma-Coin
Proof of Stake GHOST Proof of Retrievability
“Bitcoin is an idea with disruptive ramifications.”
Thank you for listening!