0% found this document useful (0 votes)
71 views6 pages

Byzantine Fault Isolation in The Farsite Distributed File System

This document introduces Byzantine Fault Isolation (BFI), a technique that allows a distributed system to continue operating in a partially correct manner when some of its constituent Byzantine Fault Tolerant (BFT) groups are faulty. BFI works by selectively weakening system semantics and strengthening the system design to limit the spread of faults between groups. The goal is to prevent a single group fault from causing total system failure. BFI is applied in Farsite, a peer-to-peer file system designed to scale to 100,000 machines, to address the problem that as more machines are added, the probability of a faulty group increases, potentially leading to system failure without fault isolation.

Uploaded by

Nghĩa Zer
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views6 pages

Byzantine Fault Isolation in The Farsite Distributed File System

This document introduces Byzantine Fault Isolation (BFI), a technique that allows a distributed system to continue operating in a partially correct manner when some of its constituent Byzantine Fault Tolerant (BFT) groups are faulty. BFI works by selectively weakening system semantics and strengthening the system design to limit the spread of faults between groups. The goal is to prevent a single group fault from causing total system failure. BFI is applied in Farsite, a peer-to-peer file system designed to scale to 100,000 machines, to address the problem that as more machines are added, the probability of a faulty group increases, potentially leading to system failure without fault isolation.

Uploaded by

Nghĩa Zer
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Byzantine Fault Isolation in the Farsite Distributed File System

John R. Douceur and Jon Howell


Microsoft Research
{johndo, howell}@microsoft.com

ABSTRACT We have therefore developed a methodology for


In a peer-to-peer system of interacting Byzantine-fault-tolerant designing a distributed system of BFT groups, wherein a
replicated-state-machine groups, as system scale increases, so faulty group is prevented from corrupting the entire
does the probability that a group will manifest a fault. If no system. We call our method Byzantine Fault Isolation
steps are taken to prevent faults from spreading among groups, (BFI). BFI makes use of formal specification [15] to
a single fault can result in total system failure. To address this constrain the semantic behavior of a faulty system. The
problem, we introduce Byzantine Fault Isolation (BFI), a technique involves selectively weakening the system
technique that enables a distributed system to operate with semantics and concomitantly strengthening the system
application-defined partial correctness when some of its design. Formal specification helps determine when these
constituent groups are faulty. We quantify BFI’s benefit and
two processes have satisfactorily converged.
describe its use in Farsite, a peer-to-peer file system designed to
scale to 100,000 machines.
The next section surveys previous approaches to
resisting Byzantine faults. Section 3 describes the Farsite
file system, the context for our work. Section 4 quantifies
1 INTRODUCTION the value of isolating Byzantine faults, and Section 5
Farsite [2] is a distributed peer-to-peer file system describes our technique. Section 6 shows an example of
that runs on a network of desktop workstations and BFI in Farsite, and Section 7 summarizes.
provides centralized file-system service. File storage in
Farsite is secure, even though the system runs on 2 PREVIOUS WORK
unsecured machines. This security is established, in part,
through the use of Byzantine Fault Tolerance (BFT), a In 1980, Pease et al. [23] introduced the problem of
well-known mechanism for building trusted services reaching agreement among a group of correctly
from untrusted components [14, 6]. BFT enables a functioning processors in the presence of arbitrarily
service to continue functioning correctly as long as fewer faulty processors; they proved that a minimum of 3t + 1
than a third of the machines it is running on are faulty. processors are needed to tolerate t faults. Two years later,
Farsite is designed to be scalable. As more and more they christened this the Byzantine Generals Problem
workstations make use of the file-system service, the [14], as it has been known ever since. A large body of
resources of these workstations become available for use research has addressed the problem of Byzantine
by the system. However, the BFT technique cannot agreement, the first decade of which is well surveyed by
exploit these additional resources to achieve the greater Barborak and Malek [4].
throughput demanded by an increasing user base: Adding In the mid-to-late 1990’s, several researchers
machines to a BFT group decreases throughput, rather combined Byzantine-fault-tolerant protocols with state-
than increasing it. machine replication [26] to produce toolkits such as
To achieve an increase in throughput with scale, Rampart [24], SecureRing [12], and BFT [6]. These
Farsite partitions its workload among multiple interacting toolkits provide a replication substrate for services
BFT groups. Unfortunately, as the count of BFT groups written as deterministic state machines. The substrate
increases, so too does the probability that some group guarantees that the service will operate correctly as long
will contain enough faulty machines that the group will as fewer than a third of its replicas are faulty. An
be unable to suppress the fault. If the system design does unfortunate property of these toolkits is that their
not account for the failure of one or more BFT groups, a throughput scales negatively: As the group size grows,
single group failure can cause the entire system to fail. the system throughput shrinks, which is the exact
The alternative to total failure is degraded operation, opposite of the behavior desired for scalable systems.
wherein individual group failures cause the system to Throughput scaling is non-positive because a non-
operate in a way that is still partially correct, rather than decreasing fraction of replicas redundantly perform each
completely unspecified. However, “partial correctness” is computation. Throughput scaling is made negative by the
not something that can be defined in an application- message load of each processor, which is linear in group
independent fashion [11]. It is thus not possible to build a size. Some recent research has addressed the latter issue:
generic substrate that enables an arbitrary service to Lewis and Saia [16] have developed a protocol that
degrade gracefully in a meaningful and useful manner. probabilistically reaches agreement if fewer than an
eighth of the replicas are faulty. The message workload
of each processor is logarithmic in group size, so Thus, every known technique for building systems
throughput scaling is less dramatically negative than for that resist Byzantine faults has at least one of the
BFT. Abd-El-Malek et al. [1] have built a replicated state following weaknesses:
machine based on the Query/Update (Q/U) protocol, • Its throughput does not increase with scale.
which requires 5t + 1 processors to tolerate t Byzantine • It addresses only a narrow academic problem.
faults. In theory, this protocol has zero throughput scaling • It does not support computation.
with system size; however, their implementation exhibits • It does not address general Byzantine faults.
negative throughput scaling, albeit at a lower rate than
BFT.
The above systems all exhibit two properties: (1) non-
3 CONTEXT – FARSITE FILE SYSTEM
positive throughput scaling and (2) all-or-nothing failure We developed BFI in the context of a scalable, peer-
semantics, meaning that failures beyond the tolerated to-peer file system called Farsite [2]. Farsite logically
threshold can cause the entire system to fail. functions as a centralized file server, but it is physically
In the absence of a Byzantine-fault-tolerant substrate distributed among the desktop workstations of a
that provides positive throughput scaling, researchers university or corporation, which may have over 105
have built systems that partition their workload among machines. In this environment, independent Byzantine
multiple machines. However, as the system size grows, faults are significantly more likely than they would be in
so does the expected number of faulty machines, which a physically secure server cluster.
in turn – given all-or-nothing failure semantics – leads to Farsite employs different techniques for managing
an increasing likelihood of total system failure. We file content and metadata. File content is encrypted,
observe that this problem could be assuaged if there were replicated, and distributed among the machines in the
some means to limit the spread of Byzantine faults. system, and a secure hash of each file’s content is
Three avenues of research are related to this problem: maintained with its metadata. File metadata is managed
First, several researchers have isolated Byzantine faults by BFT groups of workstations; we call each group a
in distributed problems of academic interest, such as “server”. Every machine in a Farsite system fills three
dining philosophers [21], vertex coloring [21], and edge roles: a client for its local user, a file host storing
coloring [25, 18]. Their solutions employ self-stabilizing encrypted content of data files, and a member of a BFT
protocols to guarantee that correct results are eventually group that acts as a server for metadata.
obtained by all nodes that are beyond a specified distance File metadata is dynamically partitioned among
(the “containment radius”) from faulty nodes. The formal servers, as follows: A Farsite system is initialized with a
notion of fault containment for self-stabilizing systems single server, called the root, which initially manages
was introduced by Ghosh et al. [10], who applied it only metadata for all files. When the metadata load on the root
to transient faults. Such transient-fault containment was becomes excessive, it assembles a randomly chosen set
achieved by Demirbas et al. [8] for the problem of of machines into a new BFT group, and it delegates a
tracking in sensor networks. None of this research offers subset of its files to this newly formed server. This
a broad approach to containing Byzantine faults. process continues as necessary, resulting in a tree of
Second, a number of researchers have investigated delegations. The fanout of the tree is a matter of policy.
ways to limit Byzantine corruption when performing The only difference between directories and data files
broadcast [3], multicast [22], or gossip [17, 20]. These is that the former may have no content and the latter may
closely related problems have no computational aspect; have no children. This is a small enough distinction that
they merely propagate data. Furthermore, they have the we refer to them both simply as “files”.
property that correct operation implicitly replicates all of
the data to all machines. The resulting redundancy 4 MOTIVATION
enables machines to vote on the data’s correctness, as in This section argues for the value of isolating
the original Byzantine agreement problem. Byzantine faults in a scalable peer-to-peer system. In
Third, some researchers have tackled specialized particular, we consider a distributed file system,
subclasses of the general problem. Merideth [19] specifically a Farsite-like system of interacting BFT
proposes a proactive fault-containment system that relies groups. For analytical simplicity, we assume that the
on fault detection, a well-known specialization of fault- system’s files are partitioned evenly among the
tolerance problems [7]. Krings and McQueen [13] constituent BFT groups, and we assume independent
employ standard Byzantine-fault-tolerant protocols only machine failure. For concreteness, we assume a machine
for carefully defined “critical functionalities.” The TTP/C fault probability of 0.001; in the analysis, we discuss our
protocol [5] isolates only a constrained subset of sensitivity to this value. We evaluate the operational
Byzantine faults, namely reception failures and consistent fault rate, which is the probability that an operation on a
transmission failures. randomly selected file exhibits a fault.
4.1 Model Thus, the probability of a faulty absolute-path-based file
If a third or more of the machines in a BFT group are operation in a system of N(f, l) BFT groups is:
faulty, the group cannot mask the fault. Therefore, if Pm Pt = F ( f , l ) N ( f , l )
is the probability that a machine is faulty, the probability This is illustrated by the light and medium lines in Fig. 1
that a group of size g manifests a fault is: for systems in which each 4-member BFT group has 4 or
(
Pg = 1 − B ⎢⎣ ( g − 1) 3⎥⎦ , g , Pm ) 16 children, respectively. Since not all file operations are
path-based, and since paths can be specified relative to
Function B is the cumulative binomial distribution open files, the operational fault rate will actually be
function. In a system of n BFT groups, the probability somewhat lower than that indicated by Fig. 1.
that at least one group manifests a fault is:
Ps = 1 − B ( 0, n, Pg ) 4.2 Analysis
We consider three cases. In the first case, there is no As the light dashed line shows, a single 4-member
fault-isolation mechanism, so a single faulty group may group has an operational fault rate of 6×10–6, which is
spread misinformation to other groups and thereby better than five nines. However, when the scale reaches
corrupt the entire system. The probability that any given 105 groups, the operational fault rate is 0.45; almost half
file operation exhibits a fault is thus equal to the of all operations exhibit faults. By contrast, with a group
probability Ps that the system contains even one faulty fanout of 16, a 105-group Farsite system exhibits an
group. This is shown by the three dashed lines in Fig. 1 operational fault rate of 3×10–5, showing that fault
for BFT groups of size 4, 7, and 10, as the count of tolerance is largely preserved as scale increases.
groups scales up to 105. An alternate way to improve the large-scale
The second case illustrates ideal fault isolation. Each operational fault rate is to increase the size of BFT
BFT group can corrupt only operations on the files it groups. However, as Fig. 1 shows, in a system of 105
manages, so the probability of a faulty file operation groups, the group size must be increased from 4 to 10 to
equals the probability Pg that the file’s managing group is achieve the operational fault rate that BFI achieves. This
faulty. System scale is thus irrelevant, as illustrated by increase cuts the system throughput by 60% (= 1 – 4/10)
the dark line in Fig. 1 for 4-member BFT groups. or more, because it increases the redundant computation
The third case illustrates BFI in Farsite. Pathname by a factor of 2.5 (= 10 / 4) and the intra-group message
lookups involve metadata from all files along the path, so traffic by a factor of 6.25 (= 102 / 42).
a faulty group can corrupt lookups to its descendent The curves in Fig. 1 are based on a machine fault rate
groups’ files. Recall that the count of nodes in a tree with of 10–3. For higher rates, the benefit of BFI is even
l levels and node fanout of f is: greater, since a larger increase in group size (and
N ( f , l ) = ( f l − 1) ( f − 1) corresponding drop in throughput) is needed to achieve
the same operational fault rate as BFI. By contrast, a
In a tree of N(f, l) BFT groups, the expected number of lower machine fault rate reduces the benefit, although not
groups that are faulty or have a faulty ancestor is defined by much: Even with a machine fault probability of 10–5 in
by the recurrence: a system of 105 groups, BFI enables 4-member groups to
F ( f , l ) = Pg ⋅ N ( f , l ) + (1 − Pg ) ⋅ f ⋅ F ( f , l − 1) achieve an operational fault rate that would otherwise
require 7-member groups and an attendant drop in
F ( f ,1) = Pg throughput of over 42% (= 1 – 4/7).
0
1×10
BFT 4, no BFI
1×10–1 BFT 7, no BFI 5 BYZANTINE FAULT ISOLATION
BFT 10, no BFI BFI is a technique that prevents a faulty BFT group
operational fault rate

–2
1×10 BFT 4, tree (4) BFI
BFT 4, tree (16) BFI
from corrupting an entire system. The technique is based
1×10–3 BFT 4, ideal BFI on using formal specification to design a distributed
1×10
–4 system with well-defined semantics [15]. BFI
semantically specifies the faulty behavior that can
1×10–5 manifest when faults occur in the distributed system.
1×10–6
Several approaches may be used to validate that the
system design adheres to the specified fault semantics. In
1×10–7 our work, we used only informal proof, but greater
1 10 100 1,000 10,000 100,000 confidence can be attained through model checking [15]
system scale (count of BFT groups) or mechanically verified formal proof [9]. Our limited
experience with such methods suggests that they would
Fig. 1: Faults at scale (machine fault probability = 0.001)
be challenging to apply to a formal spec as big as ours.
5.1 Formal Distributed System Specification corresponds to an action in the semantic spec; for
We follow an approach to formal specification example, updating a certain distributed-system data
advocated by Lamport [15]. This approach has three structure may correspond to the completion of an open
components: a semantic spec, a distributed-system spec, operation in the semantic spec. A background action
and a refinement.* Each of the two specs defines state corresponds to a non-action in the semantic spec; for
information and actions that affect the state. example, passing resources to a client by means of a
A semantic specification describes how users message has no semantic effect.
experience the system. Farsite logically functions as a The basic distributed-system spec defines interactions
centralized file server, so Farsite’s semantic spec defines among a set of non-faulty machines. The basic semantic
the behavior of a centralized file server. The state defined spec defines the user-visible behavior of a fault-free
by the semantic spec includes files, handles, and pending system. If the system design is correct, the refinement
operations. The actions defined by the semantic spec are will show that the distributed-system spec satisfies the
file-system operations: open, close, read, write, create, semantic spec.
delete, and move/rename. For example, the semantic spec
says that the open operation resolves a pathname, checks 5.2 Specifying Failure Semantics
access rights to the file, adds a new handle to the set of To understand the system’s behavior under fault
handles open on the file, and returns the handle to the conditions, we modify the distributed-system spec as
caller. The spec also describes the error codes returned if follows: We augment the state of each machine with a
a pathname fails to resolve or the access rights are new flag indicating whether the machine is corrupt, and
invalid. In sum, the semantic spec characterizes the we add a new action that sets a machine’s corrupt flag.
system from the users’ point of view. When a machine is corrupt, its local state is undefined,
A distributed-system specification describes how a set and it can send arbitrary messages to other machines at
of machines collaborate to produce a desired behavior. any time.
The state defined by the distributed-system spec includes These modifications to the distributed-system spec
machines, abstract data structures maintained by prevent it from refining to the basic semantic spec. The
machines, and messages exchanged between machines. BFI technique proceeds by progressively modifying the
The actions defined by the distributed-system spec are two specs until the distributed-system spec again satisfies
tasks performed by individual machines or BFT groups: the semantic spec:
sending messages, receiving messages, and modifying • The semantic spec is weakened to describe how
local-machine state. For example, the distributed-system faults appear to the users of the system.
spec says that when a server receives a message from a • The distributed-system spec is strengthened to
client asking for the necessary resources to perform an quarantine faults from non-corrupt machines.
open operation, if the server has the resources available, The art of Byzantine fault isolation is in determining
it sends them to the client; otherwise, it begins the what modifications to make to the two specs. An overly
process of recalling the resources from other clients that weak semantic spec may not isolate faults sufficiently; an
are holding the resources. In sum, the distributed-system overly strong distributed-system spec may not facilitate a
spec characterizes the system from the system designers’ practical and performant implementation; and a semantic
point of view. spec that is too strong or a distributed-system spec that is
A refinement is a formal correspondence between the too weak will not admit a satisfactory refinement.
semantic spec and the distributed-system spec. It In Farsite, we modified the semantic spec by
describes how to interpret distributed-system state as associating a flag with each file to indicate whether the
semantic state and distributed-system actions as semantic file is tainted, and we added an action that sets the flag.
actions. For a state example, the refinement may say that We established a refinement in which a file becomes
the attributes of a particular semantic-level file are tainted if and only if the BFT group that manages the file
defined by a particular data structure on a particular becomes corrupt.
machine in the distributed system; the particulars express It would have been ideal to weaken the semantic spec
the system designer’s rules for what data means and so little that tainted files could not corrupt operations on
which data is authoritative. non-tainted files. Unfortunately, because path-based file
Actions in the distributed-system spec are classified operations involve metadata from all files along the path,
as either foreground or background actions, according to we were unable to design a distributed system that
whether they have semantic effects. A foreground action refined to such a semantic spec. We thus had to weaken
the semantic spec further, permitting tainted files to lie
*
Lamport uses different terminology because his approach
about their parents and children and thereby to corrupt
is applicable to a broader class of problems than distributed path-based operations on the tainted file’s descendents.
systems.
We were, however, able to constrain those lies to analysis shows that this satisfies the fault semantics
prevent a tainted file from affecting operations on files enumerated above. This rule also covers the case in
elsewhere in the namespace. In particular, a tainted file which all servers are faulty, because then any semantic
cannot falsely claim an existing non-tainted file as its result is consistent with the fault semantics.
child or parent. Exhaustively, the weakened safety If no servers are faulty, our protocol ensures that all
semantics allow a tainted to appear to… servers agree on the result.
(1) …have arbitrary contents and attributes, The tricky cases are those in which one server is
(2) …not be linked into the file tree, faulty and the other two disagree on the result. It would
(3) …not have children that it actually has, be ideal to somehow prevent these cases from ever
(4) …have children that do not actually exist, or occurring, but this is provably impossible [23]. As Table
(5) …have another tainted file as a child or parent. 1 shows, for each case (faulty object, faulty source, and
In addition, the weakened liveness semantics allow faulty destination), refinement can select a satisfactory
operations involving a tainted file to not complete. semantic result if the other servers disagree in a particular
The modifications to Farsite’s distributed-system spec way (the a subcases) but not if they disagree in the
are far more involved. Some of the key principles include opposite way (the b subcases).
maintaining redundant information across BFT group For example, in case 1, the object server is faulty. In
boundaries, augmenting messages with information that subcase a, the source believes the move to be successful
justifies their correctness, ensuring unambiguous chains so it unlinks the object, but the destination believes the
of authority over data, and carefully ordering messages move to be unsuccessful so it fails to link the object.
and state updates for operations involving more than two Consequently, the object becomes unlinked from the file
BFT groups. We illustrate an example in the next section. tree. However, safety weakness 2 (see previous section)
states that a tainted file may appear to not be linked into
6 BFI EXAMPLE: MOVE OPERATION the file tree. Thus, our refinement could interpret the
outcome either as a tainted file successfully moving and
Our BFI modifications to the distributed-system spec
then not appearing linked into the tree or as a tainted file
are too extensive to permit even a high-level summary in
failing to move and then not appearing linked into the
this paper. Nonetheless, to convey the flavor of these
tree.
modifications, we outline one example of how we
The analysis of the a subcases for cases 2 and 3 is
modified Farsite’s distributed-system spec to satisfy the
similar albeit slightly more complex, because 2a must be
semantic spec. The example exploits a redundancy that
interpreted as a failed move and 3a must be interpreted as
we added for BFI, namely that parent-child links are
a successful move, to be consistent with the safety
stored by both parent and child files. If a client observes
weaknesses allowed by the failure semantics.
inconsistent information about the link between a parent
In the b subcases, no semantic result is consistent
and a child, the client regards that link as nonexistent.
with the distributed-system outcome. For example, in
The specific example we present is a component of
subcase 1b, the destination links the object but the source
the most complex file-system operation, move, whose
does not unlink it. If the refinement were to interpret this
semantics are that an object file has its parent changed
as a successful move, the destination file would become
from a source file to a destination file, thereby changing
the object’s parent, but because the source thinks the
the full pathnames of all descendents of the object file.
move failed, it still believes that it is the object’s parent,
The object, source, and destination files might be
so the object could pretend that the source is in fact its
managed by three separate servers, each of which may be
parent, which our failure semantics disallow. A similar
faulty or not. We will not present our full case analysis
argument holds for interpreting the result as a failed
here, but we will describe the highlights.
move. Since it is impossible to refine the subcase b
If all non-faulty servers agree on whether the move
outcomes, Farsite must prevent them.
succeeds, the refinement defines the semantic result as
Our protocol precludes the b subcases by ensuring
the result obtained by the non-faulty servers. Detailed
that the non-faulty servers observe certain ordering
Case Object Source Dest. Semantic constraints before declaring a move operation successful:
a faulty success failure either The object server does not commit until after the source
1 server commits, and the destination server does not
b faulty failure success none
commit until after the source and object servers commit.
a success faulty failure failure Examination of the table shows that this prevents the
2
b failure faulty success none problematic subcases. For example, subcase 1b cannot
a failure success faulty success occur because the destination will not declare success
3
b success failure faulty none until it first hears that the source has declared success,
Table 1: Fault case analysis for move operation which it has not.
7 SUMMARY [4] M. Barborak, M. Malek. “The Consensus Problem in
Fault-Tolerant Computing.” ACM Computing Surveys,
Although Byzantine Fault Tolerance (BFT) allows a 25(2), 1993.
trusted service to run on a peer-to-peer system of [5] G. Bauer, H. Kopetz, W. Steiner. “Byzantine Fault
untrusted machines, it does not support scaling up to Containment in TTP/C.” Workshop on Real-Time LANs
increase system throughput. Scale-up can be achieved by in the Internet Age, 2002.
partitioning a workload among multiple BFT groups, but [6] M. Castro, B. Liskov, “Practical Byzantine Fault
this leads to an increase in the probability of total system Tolerance.” 3rd OSDI, 1999.
failure as the system scale increases. [7] T. D. Chandra, S. Toueg. “Unreliable Failure Detectors
To solve this problem, we introduced Byzantine Fault for Asynchronous Systems.” JACM 43(2), 1996.
Isolation (BFI), a methodology for using formal [8] M. Demirbas, A. Arora, T. Nolte, N. Lynch. “A
specification to constrain the semantic behavior of a Hierarchy-Based Fault-Local Stabilizing Algorithm for
faulty system. BFI modifies a system design to formally Tracking in Sensor Networks.” 8th OPODIS, 2004.
recognize that machines may become corrupt, wherein [9] U. Engberg, P. Grønning, L. Lamport. “Mechanical
they have undefined local state and can send arbitrary Verification of Concurrent Systems with TLA.” 4th
messages to other machines. These modifications Computer Aided Verification, 1992.
highlight areas that require design changes to restrict the [10] S. Ghosh, A. Gupta, T. Herman, S. V. Pemmaraju. “Fault-
spread of faulty information. Containing Self-Stabilizing Algorithms.” PODC, 1996.
Even in the presence of design features that restrict [11] M. P. Herlihy, J. M. Wing. “Specifying Graceful
faults, corrupt machines may still affect the system’s Degradation.” IEEE TPDS 2(1), 1991.
semantics. Thus, the BFI technique involves modifying [12] K. P. Kihlstrom, L. E. Moser, P. M. Melliar-Smith. “The
SecureRing Protocols for Securing Group
the system’s defined semantics to reflect the partial
Communication.” Hawaii International Conference on
correctness achievable when the system is degraded. BFI
System Sciences, 1998.
uses formal specification to ensure that the modified
[13] A. W. Krings, M. A. McQueen. “A Byzantine Resilient
system design satisfies the modified system semantics.
Approach to Network Security.” 29th International
We quantified the benefit of BFI to scalable systems: Symposium on Fault-Tolerant Computing, 1999.
In a tree-structured system of 105 BFT groups, wherein a [14] L. Lamport, R. Shostak, M. Pease. “The Byzantine
faulty group can corrupt its descendents’ operations, BFI Generals Problem.” ACM TPLS, 4(3), 1982.
can enable 4-member BFT groups to achieve the same [15] L. Lamport. Specifying Systems. Addison-Wesley, 2003.
operational fault rate as 10-member BFT groups, without [16] C.S. Lewis, J. Saia. “Scalable Byzantine Agreement.”
the corresponding 60% drop in throughput due to Technical Report, University of New Mexico, 2004.
increased replication and message traffic. [17] D. Malkhi, Y. Mansour, M. K. Reiter, “On Diffusing
We employed the BFI technique in the design of the Updates in a Byzantine Environment.” 18th SRDS, 1999.
Farsite distributed file system, a large and complex peer- [18] T. Masuzawa, S. Tixeuil. “A Self-Stabilizing Link-
to-peer system designed to scale to 105 machines. BFI Coloring Protocol Resilient to Unbounded Byzantine
guided us toward adding specific redundancies, enriching Faults in Arbitrary Networks.” Laboratoire de Recherche
messages, restricting authority, and constraining the order en Informatique Report #1396, 2005.
of distributed operation steps. Using BFI, we related [19] M. G. Merideth. “Enhancing Survivability with Proactive
these design changes to the system’s semantics, thereby Fault-Containment.” DSN Student Forum, 2003.
showing that file corruption cannot spread to unrelated [20] Y. M. Minsky, F. B. Schneider. “Tolerating Malicious
regions of the file-system namespace. Gossip.” Distributed Computing, 16(1), 2003.
Prior to BFI, no technique has addressed how to [21] M. Nesterenko, A. Arora. “Tolerance to Unbounded
interconnect multiple BFT groups in a way that isolates Byzantine Faults.” SRDS, 2002.
[22] V. Pappas, B. Zhang, A. Terzis, L. Zhang, “Fault-
Byzantine faults.
Tolerant Data Delivery for Multicast Overlay Networks.”
ICDCS, 2004.
REFERENCES [23] M. Pease, R. Shostak, L. Lamport. “Reaching Agreement
[1] M. Abd-El-Malek, G. Ganger, G. Goodson, M. Reiter, J. in the Presence of Faults.” JACM 27(2), 1980.
Wylie. “Fault-scalable Byzantine Fault-Tolerant [24] M. K. Reiter. “The Rampart Toolkit for Building High-
Services.” 20th SOSP, 2005. Integrity Services.” TPDS (LNCS 938), 1995.
[2] A. Adya, W. J. Bolosky, M. Castro, G. Cermak, R. [25] Y. Sakurai, F Ooshita, T. Masuzawa. “A Self-stabilizing
Chaiken, J. R. Douceur, J. Howell, J. R. Lorch, M. Link-Coloring Protocol Resilient to Byzantine Faults in
Theimer, R. P. Wattenhofer. “FARSITE: Federated, Tree Networks.” 8th OPODIS, 2004.
Available, and Reliable Storage for an Incompletely [26] F. Schneider. “Implementing Fault-Tolerant Services
Trusted Environment.” 5th OSDI, 2002. Using the State Machine Approach: A Tutorial.” ACM
[3] Y. Azar, S. Kutten, B. Patt-Shamir. “Distributed Error Computing Surveys, 22(4), 1990.
Confinement.” PODC, 2003.

You might also like