Double Spending
Double Spending
net/publication/351136583
CITATIONS                                                                                                 READS
0                                                                                                         573
5 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Huawei Huang on 13 May 2021.
                      Jian Zheng∗ , Huawei Huang∗ , Canlin Li∗ , Zibin Zheng∗ Song Guo†
            ∗ School
                   of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou, China
   † Department of Computing, The Hong Kong Polytechnic University, Hong Kong. song.guo@polyu.edu.hk
Abstract—Bitcoin is currently the cryptocurrency with the               : Block mined by an honest miner
largest market share. Many previous studies have explored               : Block mined by the attacker                           5 Release all blocks
the security of Bitcoin from the perspective of blockchain               2 Forking attack
mining. Especially on the double-spending attacks (DSA),
some state-of-the-art studies have proposed various analytical
models, aiming to understand the insights behind the double-         1 Target TX is released   3 Pack the target TX          4 Confirmation position
spending attacks. However, we believe that advanced versions
of DSA can be developed to create new threats for the Bitcoin                     Figure 1. Illustration of a double-spending attack.
ecosystem. To this end, this paper mainly presents a new type of    most of the mining power is centralized in a small number of
double-spending attack named Adaptive DSA in the context of         mining pools. Such centralization brings a high vulnerability
the Bitcoin blockchain, and discloses the associated insights. In   of suffering from mining attacks. Furthermore, in Bitcoin
our analytical model, the double-spending attack is converted       market, the number of large-amount transactions is increas-
into a Markov Decision Process. We then exploit the Stochastic      ing when the number of exchanges grows. The aggregated
Dynamic Programming (SDP) approach to obtain the optimal            money in exchanges draws great attention from cryptocur-
attack strategies towards Adaptive DSA. Through the proposed        rency hackers. Once the double-spending attack occurs in
analytical model and the disclosed insights behind Adaptive         an exchange, e.g., the Mt. Gox attack [7], the Bitcoin
DSA, we aim to alert the Bitcoin ecosystem that the threat of       market will experience a drastic fluctuation. Therefore, the
double-spending attacks is still at a dangerous level.              security of the Bitcoin blockchain has raised great attention,
                                                                    especially in the context of the real double-spending attacks
Index Terms—Bitcoin Blockchain, Double-Spending Attack              aforementioned.
                                                                        Literature Review. The double-spending attack is a
1. Introduction                                                     classic problem in blockchains. A number of previous
                                                                    works have explored the double-spending attacks on Bitcoin
                                                                    blockchain. Nakamoto [1] raised the problem of double-
    Following the drastically increasing market price of            spending attacks for the first time in the Bitcoin system.
Bitcoin [1] in the past few months, the cryptocurrencies            Pinzon et al. [8] proposed two new double-spending attacks:
continually attract ever-growing interests from all over the        race attack and fanny attack. The race attack achieves attack-
world. However, cryptocurrencies, e.g., Bitcoin, face the           ing by controlling miner fees. The fanny attack chooses to
inherent threat of Double-Spending Attacks (DSA). For               control block broadcast time to achieve attacking. Liao et al.
example, the Bitcoin Gold (BTG) [2] was stolen more than            [9] found that a small number of attackers can increase the
388200 BTG coins by double-spending attacks on May                  probability of a 51% double-spending attack by connecting
2018, and suffered from a loss of about 18.6 million dollars.       rational nodes. Biais et al. [10] proved that it is profitable
On January 2019, the Ethereum Classical [3] blockchain has          to try a 51% double-spending attack when the transaction
occurred the roll-back anomaly, which was very possibly             price meets certain conditions, but such an attack requires a
attacked by double-spending attacks. This anomaly led to a          large amount of hashpower and money. Gervais et al. [11]
loss of around 1.1 million dollars. Recently on December            proposed a simple double-spending attack strategy model
2020, the famous cryptocurrency Aeternity [4] also was              based on Markov Decision Processes (MDP) and revealed
attacked by the double-spending attack, resulting in a huge         the effectiveness of the double-spending attack strategy.
loss of approximately 39 million Aeternity coins.
                                                                        Through reviewing those representative analytical mod-
    Since Bitcoin has become the top market-share cryp-             els proposed in the past few years, we have the following
tocurrency [5], [6], many organizations have invested a large       findings.
amount of money as the strategic digital assets. In reality,
Bitcoin miners prefer to join in a mining pool because of               •     Many existing attack policies [12], [13] aim at their
brutal competition of Proof-of-Work (PoW) based mining                        success probabilities. However, a high success attack
among miners. The mining-pool phenomenon leads to that                        probability does not mean a high return of reward.
too few. Under such the unfavourable situation, the success                   p, q: the probabilities of state
                                                                                    transition
probability of the attacker is very low. To prevent a large loss
                                                                                                       q
                                                                                                                                          si , j +1
brought by the failed double-spending attacks, the attacker
is suggested giving up the attack when the situation is
                                                                                                       p                                  si +1, j
unfavourable. Therefore, we are motivated to propose the                                si , j
Adaptive DSA. In the following, we analyze the Adaptive
DSA by particularly evaluating those unfavourable situations             Figure 4. The decision of Adaptive DSA di,j = 1: keep attacking. A
illustrated in Fig. 2.                                                   decision di,j =1 also triggers state transition: si,j → si+1,j or si,j+1 .
     To calculate the reward under the Adaptive DSA, we
let cost denote the cost that the attacker spends during the
process of any new block is generated. To achieve a DSA,                 3.2.2. Attack-Decision Matrix. We then let di,j = 0/1
the attacker has to keep generating new blocks in his fork.              denote the attack decision of the attacker while the attack is
Thus, cost includes not only the maintaining fee to run                  in a specific decision point si,j . In the attack-decision matrix
the mining machines, but also the opportunistic loss if the              {di,j }, every element di,j values 0 or 1. The definition of
attacker were using his hashpower to mine blocks just as                 the binary decision is as follows.
an honest miner. Once the attacker succeeds, its reward is
calculated as the mining reward minus to the maintaining                                          
fee. Thus, we have cost = p · d, p > 0.                                                           1,
                                                                                                                 if i = j = 0;
     To the main chain, when the number of confirmation                                di,j      = 0,             if i = z or j = z;
                                                                                                  
                                                                                                  0 or 1,
blocks reaches z in a faster speed than the attacker’s fork,                                                      otherwise.
the target transaction is viewed as confirmed deeply. The
attacker has to quit the current attack, and prepare for the
                                                                              When di,j = 0, the attacker releases the fork, resulting
next. That is, when j > z , the attacker stops mining in
                                                                         in 3 possible cases shown in Fig. 3. Case-1: if i > j , the
its fork. On the other side, once the attacker generates the
                                                                         attacker dominates the block mining during the attack, and
(z + 1)-th block before the main chain outputs the z -th
                                                                         receives the block-mining reward immediately. Furthermore,
block, the double-spending attack succeeds. To maximize
                                                                         if i > j & j ≥ z , the attacker succeeds the double-spending
the profit of the attack, the attacker may also keep mining
                                                                         attack. Case-2: if i < j , the attacker receives nothing. Case-
more blocks in his fork, before the main chain discovers its
                                                                         3: if i = j , the attacker will follow the fork mined by
z -th confirmation block. Therefore, when i ≥ z & i > j , the
                                                                         himself.
attacker will keep attacking in his fork. Except the two cases
described above, the attacker cannot make a determined                        The decision di,j = 1 indicates that the attacker still
decision when 0 ≤ i, j ≤ z − 1. The goal of this section                 keeps attacking in the fork chain until the next decision
is trying to find an optimal attack-decision matrix such that            point shows up. As shown in Fig. 4, the current decision
the profit of Adaptive DSA can be maximized.                             point si,j (0 ≤ i, j ≤ z − 1, i, j ∈ N ) will be transferred to
                                                                         either state si+1,j or state si,j+1 , 0 ≤ i, j ≤ z − 1.
3.2. Terminologies
                                                                         3.2.3. Occurrence-Probability Matrix. In the occurrence-
3.2.1. Decision Point. Normally, an honest miner would not               probability matrix, represented by {Pi,j }, each element Pi,j
change the chain it follows, either after the main chain or              describes the probability of achieving a decision point si,j
a fork. Only when a new block is generated, the miner has                during an attack under a given decision matrix {di,j }. In
to decide whether to change the fork it follows currently                particular, s0,0 is always the starting point of any attack.
or not. Therefore, we define the instant when a new block                Thus, P0,0 = 1. As described in section 3.2.2, si,j stems
is generated as a decision point, which is denoted by si,j ,             from si−1,j with probability p if di−1,j = 1, or si,j−1
(i ∈ N , j ∈ N ). Here i and j represent the numbers of                  with probability q if di,j−1 = 1. Note that, if j = 0 or
the attacker’s blocks in the fork chain and the confirmation             i = z , si,j can only stem from si−1,j with probability p.
blocks in the main chain, respectively. When reaching a                  Similarly, if i = 0 or j = z , si,j can only stem from si,j−1
decision point, the attacker has to decide whether to continue           with probability q . Therefore, we can define the occurrence-
attacking or quit.                                                       probability matrix {Pi,j } as follows.
                                                                         3.4.1. SDP-based Formulation. In the context of a
                                                                        decision-making problem, the key elements of the SDP
           1, if i = j = 0;
                                                                         model are described as follows.
         
         
         
            i−1,j · p · Pi−1,j , if j = 0 or i = z;
         d
Pi,j   =                                                                      Stages: In a z × z decision matrix, each sub-diagonal
         
         d i,j−1 · q · Pi,j−1 , if i = 0 or j = z;                      represents a stage. We mark the sub-diagonal as 0, 1, 2, · · · ,
         
         d
            i−1,j · p · Pi−1,j + di,j−1 · q · Pi,j−1 , otherwise.        2z -1 from the upper left to the lower right, and the whole
                                                                (1)      system has a total of 2z stages.
                                                                              States: The set of states in stage n is denoted as S[n] =
3.3. Profit-Maximization Problem                                         {si,j |i + j = n} ∪ {s0i,j |i + j = n − 1}, when n > 0.
                                                                         Especially, if n = 0, S[n] = {si,j |i + j = n}. In addition,
     Given an attack-decision matrix {di,j }, we can derive              s0i,j is a special state indicating that the attacker releases all
the occurrence probability matrix {Pi,j } according to the               the sub-chain and begins to mine on the longest chain.
stochastic state transition. Next, the expected reward can                    Decisions: We use di,j to denote the decision variable at
be also calculated. Let Ri,j denote the expected reward                  si,j , di,j ∈ Ui,j , where Ui,j denotes the set of all decisions.
when the attacker stops attacking at a decision point si,j ,                  Transition probability: The transition-probability func-
i.e., di,j turns to 0 at si,j . When stopping attacking at si,j ,        tion T (si,j , di,j ) returns the probability that the current state
the attacker will release the blocks mined in his fork, and              si,j transfers to the next state by executing the decision
participate in the regular mining just as an honest miner                di,j . In SDP framework, we can only get the probability
following the main chain. Thus, Ri,j is calculated by the                distribution of the next state instead of accurate states.
coinbase reward minus the mining cost:                                   Referring to section 3.2, the transition-probability function
                                                                         is written as:
 Ri,j =                                                                                             
                                                                                                   p, if si,j → si+1,j , di,j = 1;
                                                                                                    
 −(i + j) · cost, if i < j;
                                                                                 T (si,j , di,j ) = q, if si,j → si,j+1 , di,j = 1;
 i · d − (i + j) · cost, if z > i > j;
 
 
                                                                                                   1, if s → s0 , d = 0;
                                                                                                                                         (4)
                                                                                                            i,j     i,j  i,j
 p · i · d − (i + j) · cost, if z > i = j;
 
 
                                                                   (2)                                0 ≤ i, j ≤ z − 1, i, j ∈ N.
   p[b + d(z + 1) − (i + j + 1) · cost] + q · Ri,j+1 ,
                                                                             Reward function: The reward function g(si,j , di,j ) re-
 
  if i = z, j ≤ z − 2;
 
 
 
 
 
  p[b + d(z + 1) − (i + j + 1) · cost]+                                 turns the reward when executing a decision di,j at the current
                                                                        state si,j . The reward function is defined as:
      q[p(b + z · d) − 2z · cost], if i = z, j = z − 1.
 
                                                                                                         (
    With f ({di,j }) denoting the expected reward function                                                 0,     if di,j = 0;
                                                                                       g(si,j , di,j ) =
under a decision matrix {di,j }, the goal is to strive for the                                             −cost, if di,j = 1;      (5)
optimal attack-decision matrix such that the expected re-                                         0 ≤ i, j ≤ z − 1, i, j ∈ N.
ward is maximized. We then present the profit-maximization
problem as follows.                                                      3.4.2. Optimal Decision-Making. When the decision ma-
                            z X
                            X z                                          trix {di,j } is fixed, the profit of the attacker can be derived.
        max f ({di,j }) =              (1 − di,j ) · Pi,j · Ri,j         Let Jn (si,j ) denote the profit matrix, where n = i+j . Recall
                             i=0 j=0                                     that, our objective is to maximize the expected profit by
                                                                   (3)   deciding the optimal decision matrix. According to Bellman
        s.t. Eq. (1) and Eq. (2).
                                                                         optimality theorem [15], the objective function is expressed
        Variables : di,j ∈ {0, 1}, 0 ≤ i, j ≤ z, i, j ∈ N.               as:
    Solving the problem (3) is infeasible when z is large,
since the problem complexity increases exponentially fol-                 Jn (si,j ) = max{g(si,j , di,j ) + E[Jn+1 |si,j , di,j ]}
                                                                                        di,j
lowing z . Therefore, we need to design a complexity-
efficient approach to solve this problem.                                  = max{g(si,j , di,j ) + T (si,j , di,j ) · Jn+1 (S[n + 1])}
                                                                               di,j                                                      (6)
                                                                           = max{Jn+1 (s0i,j ), −cost + p · Jn+1 (si+1,j )
3.4. Problem-Solving based on Stochastic Dynamic
                                                                             + q · Jn+1 (si,j+1 )}, 0 ≤ i, j ≤ z − 1, i, j ∈ N.
Programming (SDP)
                                                                             where s0i,j in fact indicates the special state occurring
   In fact, the optimization problem (3) can be viewed as a              in the stage S[n + 1]. In addition, we should define the
two-dimensional multi-stage Stochastic Dynamic Program-                  boundary condition of Jn (si,j ). For the right boundary (i.e.,
ming (SDP) problem [15]. At the decision point si,j , we                 0 ≤ i ≤ z − 1, j = z ), the attacker is failed and gains
only need to consider the current state as well as the future            nothing. Thus, Ji,j = 0 (0 ≤ i ≤ z − 1, j = z). For the
expected reward, regardless what decisions have been made                lower boundary (i.e., 0 ≤ j ≤ z − 1, i = z ), the target
and how to get to that state. In other words, the problem (3)            transaction has not been confirmed and the attacker should
has the Markovian property.                                              keep mining until j = z . The attack is successful if i = z +
 Algorithm 1: SDP for Adaptive DSA
                                                                                                     
   Input : b, p, d, z, cost.
   Output: decision matrix {di,j }, expected reward.
                                                                                                     
5 H Z D U G % 7 &