Byzantine Fault Isolation in The Farsite Distributed File System

This document introduces Byzantine Fault Isolation (BFI), a technique that allows a distributed system to continue operating in a partially correct manner when some of its constituent Byzantine Fault Tolerant (BFT) groups are faulty. BFI works by selectively weakening system semantics and strengthening the system design to limit the spread of faults between groups. The goal is to prevent a single group fault from causing total system failure. BFI is applied in Farsite, a peer-to-peer file system designed to scale to 100,000 machines, to address the problem that as more machines are added, the probability of a faulty group increases, potentially leading to system failure without fault isolation.

Uploaded by

Nghĩa Zer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views6 pages

Byzantine Fault Isolation in The Farsite Distributed File System

Uploaded by

Nghĩa Zer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Byzantine Fault Isolation in the Farsite Distributed File System

John R. Douceur and Jon Howell

Microsoft Research
{johndo, howell}@microsoft.com

ABSTRACT We have therefore developed a methodology for

In a peer-to-peer system of interacting Byzantine-fault-tolerant designing a distributed system of BFT groups, wherein a
replicated-state-machine groups, as system scale increases, so faulty group is prevented from corrupting the entire
does the probability that a group will manifest a fault. If no system. We call our method Byzantine Fault Isolation
steps are taken to prevent faults from spreading among groups, (BFI). BFI makes use of formal specification [15] to
a single fault can result in total system failure. To address this constrain the semantic behavior of a faulty system. The
problem, we introduce Byzantine Fault Isolation (BFI), a technique involves selectively weakening the system
technique that enables a distributed system to operate with semantics and concomitantly strengthening the system
application-defined partial correctness when some of its design. Formal specification helps determine when these
constituent groups are faulty. We quantify BFI’s benefit and
two processes have satisfactorily converged.
describe its use in Farsite, a peer-to-peer file system designed to
scale to 100,000 machines.
The next section surveys previous approaches to
resisting Byzantine faults. Section 3 describes the Farsite
file system, the context for our work. Section 4 quantifies
1 INTRODUCTION the value of isolating Byzantine faults, and Section 5
Farsite [2] is a distributed peer-to-peer file system describes our technique. Section 6 shows an example of
that runs on a network of desktop workstations and BFI in Farsite, and Section 7 summarizes.
provides centralized file-system service. File storage in
Farsite is secure, even though the system runs on 2 PREVIOUS WORK
unsecured machines. This security is established, in part,
through the use of Byzantine Fault Tolerance (BFT), a In 1980, Pease et al. [23] introduced the problem of
well-known mechanism for building trusted services reaching agreement among a group of correctly
from untrusted components [14, 6]. BFT enables a functioning processors in the presence of arbitrarily
service to continue functioning correctly as long as fewer faulty processors; they proved that a minimum of 3t + 1
than a third of the machines it is running on are faulty. processors are needed to tolerate t faults. Two years later,
Farsite is designed to be scalable. As more and more they christened this the Byzantine Generals Problem
workstations make use of the file-system service, the [14], as it has been known ever since. A large body of
resources of these workstations become available for use research has addressed the problem of Byzantine
by the system. However, the BFT technique cannot agreement, the first decade of which is well surveyed by
exploit these additional resources to achieve the greater Barborak and Malek [4].
throughput demanded by an increasing user base: Adding In the mid-to-late 1990’s, several researchers
machines to a BFT group decreases throughput, rather combined Byzantine-fault-tolerant protocols with state-
than increasing it. machine replication [26] to produce toolkits such as
To achieve an increase in throughput with scale, Rampart [24], SecureRing [12], and BFT [6]. These
Farsite partitions its workload among multiple interacting toolkits provide a replication substrate for services
BFT groups. Unfortunately, as the count of BFT groups written as deterministic state machines. The substrate
increases, so too does the probability that some group guarantees that the service will operate correctly as long
will contain enough faulty machines that the group will as fewer than a third of its replicas are faulty. An
be unable to suppress the fault. If the system design does unfortunate property of these toolkits is that their
not account for the failure of one or more BFT groups, a throughput scales negatively: As the group size grows,
single group failure can cause the entire system to fail. the system throughput shrinks, which is the exact
The alternative to total failure is degraded operation, opposite of the behavior desired for scalable systems.
wherein individual group failures cause the system to Throughput scaling is non-positive because a non-
operate in a way that is still partially correct, rather than decreasing fraction of replicas redundantly perform each
completely unspecified. However, “partial correctness” is computation. Throughput scaling is made negative by the
not something that can be defined in an application- message load of each processor, which is linear in group
independent fashion [11]. It is thus not possible to build a size. Some recent research has addressed the latter issue:
generic substrate that enables an arbitrary service to Lewis and Saia [16] have developed a protocol that
degrade gracefully in a meaningful and useful manner. probabilistically reaches agreement if fewer than an
eighth of the replicas are faulty. The message workload
of each processor is logarithmic in group size, so Thus, every known technique for building systems
throughput scaling is less dramatically negative than for that resist Byzantine faults has at least one of the
BFT. Abd-El-Malek et al. [1] have built a replicated state following weaknesses:
machine based on the Query/Update (Q/U) protocol, • Its throughput does not increase with scale.
which requires 5t + 1 processors to tolerate t Byzantine • It addresses only a narrow academic problem.
faults. In theory, this protocol has zero throughput scaling • It does not support computation.
with system size; however, their implementation exhibits • It does not address general Byzantine faults.
negative throughput scaling, albeit at a lower rate than
BFT.
The above systems all exhibit two properties: (1) non-
3 CONTEXT – FARSITE FILE SYSTEM
positive throughput scaling and (2) all-or-nothing failure We developed BFI in the context of a scalable, peer-
semantics, meaning that failures beyond the tolerated to-peer file system called Farsite [2]. Farsite logically
threshold can cause the entire system to fail. functions as a centralized file server, but it is physically
In the absence of a Byzantine-fault-tolerant substrate distributed among the desktop workstations of a
that provides positive throughput scaling, researchers university or corporation, which may have over 105
have built systems that partition their workload among machines. In this environment, independent Byzantine
multiple machines. However, as the system size grows, faults are significantly more likely than they would be in
so does the expected number of faulty machines, which a physically secure server cluster.
in turn – given all-or-nothing failure semantics – leads to Farsite employs different techniques for managing
an increasing likelihood of total system failure. We file content and metadata. File content is encrypted,
observe that this problem could be assuaged if there were replicated, and distributed among the machines in the
some means to limit the spread of Byzantine faults. system, and a secure hash of each file’s content is
Three avenues of research are related to this problem: maintained with its metadata. File metadata is managed
First, several researchers have isolated Byzantine faults by BFT groups of workstations; we call each group a
in distributed problems of academic interest, such as “server”. Every machine in a Farsite system fills three
dining philosophers [21], vertex coloring [21], and edge roles: a client for its local user, a file host storing
coloring [25, 18]. Their solutions employ self-stabilizing encrypted content of data files, and a member of a BFT
protocols to guarantee that correct results are eventually group that acts as a server for metadata.
obtained by all nodes that are beyond a specified distance File metadata is dynamically partitioned among
(the “containment radius”) from faulty nodes. The formal servers, as follows: A Farsite system is initialized with a
notion of fault containment for self-stabilizing systems single server, called the root, which initially manages
was introduced by Ghosh et al. [10], who applied it only metadata for all files. When the metadata load on the root
to transient faults. Such transient-fault containment was becomes excessive, it assembles a randomly chosen set
achieved by Demirbas et al. [8] for the problem of of machines into a new BFT group, and it delegates a
tracking in sensor networks. None of this research offers subset of its files to this newly formed server. This
a broad approach to containing Byzantine faults. process continues as necessary, resulting in a tree of
Second, a number of researchers have investigated delegations. The fanout of the tree is a matter of policy.
ways to limit Byzantine corruption when performing The only difference between directories and data files
broadcast [3], multicast [22], or gossip [17, 20]. These is that the former may have no content and the latter may
closely related problems have no computational aspect; have no children. This is a small enough distinction that
they merely propagate data. Furthermore, they have the we refer to them both simply as “files”.
property that correct operation implicitly replicates all of
the data to all machines. The resulting redundancy 4 MOTIVATION
enables machines to vote on the data’s correctness, as in This section argues for the value of isolating
the original Byzantine agreement problem. Byzantine faults in a scalable peer-to-peer system. In
Third, some researchers have tackled specialized particular, we consider a distributed file system,
subclasses of the general problem. Merideth [19] specifically a Farsite-like system of interacting BFT
proposes a proactive fault-containment system that relies groups. For analytical simplicity, we assume that the
on fault detection, a well-known specialization of fault- system’s files are partitioned evenly among the
tolerance problems [7]. Krings and McQueen [13] constituent BFT groups, and we assume independent
employ standard Byzantine-fault-tolerant protocols only machine failure. For concreteness, we assume a machine
for carefully defined “critical functionalities.” The TTP/C fault probability of 0.001; in the analysis, we discuss our
protocol [5] isolates only a constrained subset of sensitivity to this value. We evaluate the operational
Byzantine faults, namely reception failures and consistent fault rate, which is the probability that an operation on a
transmission failures. randomly selected file exhibits a fault.
4.1 Model Thus, the probability of a faulty absolute-path-based file
If a third or more of the machines in a BFT group are operation in a system of N(f, l) BFT groups is:
faulty, the group cannot mask the fault. Therefore, if Pm Pt = F ( f , l ) N ( f , l )
is the probability that a machine is faulty, the probability This is illustrated by the light and medium lines in Fig. 1
that a group of size g manifests a fault is: for systems in which each 4-member BFT group has 4 or
(
Pg = 1 − B ⎢⎣ ( g − 1) 3⎥⎦ , g , Pm ) 16 children, respectively. Since not all file operations are
path-based, and since paths can be specified relative to
Function B is the cumulative binomial distribution open files, the operational fault rate will actually be
function. In a system of n BFT groups, the probability somewhat lower than that indicated by Fig. 1.
that at least one group manifests a fault is:
Ps = 1 − B ( 0, n, Pg ) 4.2 Analysis
We consider three cases. In the first case, there is no As the light dashed line shows, a single 4-member
fault-isolation mechanism, so a single faulty group may group has an operational fault rate of 6×10–6, which is
spread misinformation to other groups and thereby better than five nines. However, when the scale reaches
corrupt the entire system. The probability that any given 105 groups, the operational fault rate is 0.45; almost half
file operation exhibits a fault is thus equal to the of all operations exhibit faults. By contrast, with a group
probability Ps that the system contains even one faulty fanout of 16, a 105-group Farsite system exhibits an
group. This is shown by the three dashed lines in Fig. 1 operational fault rate of 3×10–5, showing that fault
for BFT groups of size 4, 7, and 10, as the count of tolerance is largely preserved as scale increases.
groups scales up to 105. An alternate way to improve the large-scale
The second case illustrates ideal fault isolation. Each operational fault rate is to increase the size of BFT
BFT group can corrupt only operations on the files it groups. However, as Fig. 1 shows, in a system of 105
manages, so the probability of a faulty file operation groups, the group size must be increased from 4 to 10 to
equals the probability Pg that the file’s managing group is achieve the operational fault rate that BFI achieves. This
faulty. System scale is thus irrelevant, as illustrated by increase cuts the system throughput by 60% (= 1 – 4/10)
the dark line in Fig. 1 for 4-member BFT groups. or more, because it increases the redundant computation
The third case illustrates BFI in Farsite. Pathname by a factor of 2.5 (= 10 / 4) and the intra-group message
lookups involve metadata from all files along the path, so traffic by a factor of 6.25 (= 102 / 42).
a faulty group can corrupt lookups to its descendent The curves in Fig. 1 are based on a machine fault rate
groups’ files. Recall that the count of nodes in a tree with of 10–3. For higher rates, the benefit of BFI is even
l levels and node fanout of f is: greater, since a larger increase in group size (and
N ( f , l ) = ( f l − 1) ( f − 1) corresponding drop in throughput) is needed to achieve
the same operational fault rate as BFI. By contrast, a
In a tree of N(f, l) BFT groups, the expected number of lower machine fault rate reduces the benefit, although not
groups that are faulty or have a faulty ancestor is defined by much: Even with a machine fault probability of 10–5 in
by the recurrence: a system of 105 groups, BFI enables 4-member groups to
F ( f , l ) = Pg ⋅ N ( f , l ) + (1 − Pg ) ⋅ f ⋅ F ( f , l − 1) achieve an operational fault rate that would otherwise
require 7-member groups and an attendant drop in
F ( f ,1) = Pg throughput of over 42% (= 1 – 4/7).
0
1×10
BFT 4, no BFI
1×10–1 BFT 7, no BFI 5 BYZANTINE FAULT ISOLATION
BFT 10, no BFI BFI is a technique that prevents a faulty BFT group
operational fault rate

–2
1×10 BFT 4, tree (4) BFI
BFT 4, tree (16) BFI
from corrupting an entire system. The technique is based
1×10–3 BFT 4, ideal BFI on using formal specification to design a distributed
1×10
–4 system with well-defined semantics [15]. BFI
semantically specifies the faulty behavior that can
1×10–5 manifest when faults occur in the distributed system.
1×10–6
Several approaches may be used to validate that the
system design adheres to the specified fault semantics. In
1×10–7 our work, we used only informal proof, but greater
1 10 100 1,000 10,000 100,000 confidence can be attained through model checking [15]
system scale (count of BFT groups) or mechanically verified formal proof [9]. Our limited
experience with such methods suggests that they would
Fig. 1: Faults at scale (machine fault probability = 0.001)
be challenging to apply to a formal spec as big as ours.
5.1 Formal Distributed System Specification corresponds to an action in the semantic spec; for
We follow an approach to formal specification example, updating a certain distributed-system data
advocated by Lamport [15]. This approach has three structure may correspond to the completion of an open
components: a semantic spec, a distributed-system spec, operation in the semantic spec. A background action
and a refinement.* Each of the two specs defines state corresponds to a non-action in the semantic spec; for
information and actions that affect the state. example, passing resources to a client by means of a
A semantic specification describes how users message has no semantic effect.
experience the system. Farsite logically functions as a The basic distributed-system spec defines interactions
centralized file server, so Farsite’s semantic spec defines among a set of non-faulty machines. The basic semantic
the behavior of a centralized file server. The state defined spec defines the user-visible behavior of a fault-free
by the semantic spec includes files, handles, and pending system. If the system design is correct, the refinement
operations. The actions defined by the semantic spec are will show that the distributed-system spec satisfies the
file-system operations: open, close, read, write, create, semantic spec.
delete, and move/rename. For example, the semantic spec
says that the open operation resolves a pathname, checks 5.2 Specifying Failure Semantics
access rights to the file, adds a new handle to the set of To understand the system’s behavior under fault
handles open on the file, and returns the handle to the conditions, we modify the distributed-system spec as
caller. The spec also describes the error codes returned if follows: We augment the state of each machine with a
a pathname fails to resolve or the access rights are new flag indicating whether the machine is corrupt, and
invalid. In sum, the semantic spec characterizes the we add a new action that sets a machine’s corrupt flag.
system from the users’ point of view. When a machine is corrupt, its local state is undefined,
A distributed-system specification describes how a set and it can send arbitrary messages to other machines at
of machines collaborate to produce a desired behavior. any time.
The state defined by the distributed-system spec includes These modifications to the distributed-system spec
machines, abstract data structures maintained by prevent it from refining to the basic semantic spec. The
machines, and messages exchanged between machines. BFI technique proceeds by progressively modifying the
The actions defined by the distributed-system spec are two specs until the distributed-system spec again satisfies
tasks performed by individual machines or BFT groups: the semantic spec:
sending messages, receiving messages, and modifying • The semantic spec is weakened to describe how
local-machine state. For example, the distributed-system faults appear to the users of the system.
spec says that when a server receives a message from a • The distributed-system spec is strengthened to
client asking for the necessary resources to perform an quarantine faults from non-corrupt machines.
open operation, if the server has the resources available, The art of Byzantine fault isolation is in determining
it sends them to the client; otherwise, it begins the what modifications to make to the two specs. An overly
process of recalling the resources from other clients that weak semantic spec may not isolate faults sufficiently; an
are holding the resources. In sum, the distributed-system overly strong distributed-system spec may not facilitate a
spec characterizes the system from the system designers’ practical and performant implementation; and a semantic
point of view. spec that is too strong or a distributed-system spec that is
A refinement is a formal correspondence between the too weak will not admit a satisfactory refinement.
semantic spec and the distributed-system spec. It In Farsite, we modified the semantic spec by
describes how to interpret distributed-system state as associating a flag with each file to indicate whether the
semantic state and distributed-system actions as semantic file is tainted, and we added an action that sets the flag.
actions. For a state example, the refinement may say that We established a refinement in which a file becomes
the attributes of a particular semantic-level file are tainted if and only if the BFT group that manages the file
defined by a particular data structure on a particular becomes corrupt.
machine in the distributed system; the particulars express It would have been ideal to weaken the semantic spec
the system designer’s rules for what data means and so little that tainted files could not corrupt operations on
which data is authoritative. non-tainted files. Unfortunately, because path-based file
Actions in the distributed-system spec are classified operations involve metadata from all files along the path,
as either foreground or background actions, according to we were unable to design a distributed system that
whether they have semantic effects. A foreground action refined to such a semantic spec. We thus had to weaken
the semantic spec further, permitting tainted files to lie
*
Lamport uses different terminology because his approach
about their parents and children and thereby to corrupt
is applicable to a broader class of problems than distributed path-based operations on the tainted file’s descendents.
systems.
We were, however, able to constrain those lies to analysis shows that this satisfies the fault semantics
prevent a tainted file from affecting operations on files enumerated above. This rule also covers the case in
elsewhere in the namespace. In particular, a tainted file which all servers are faulty, because then any semantic
cannot falsely claim an existing non-tainted file as its result is consistent with the fault semantics.
child or parent. Exhaustively, the weakened safety If no servers are faulty, our protocol ensures that all
semantics allow a tainted to appear to… servers agree on the result.
(1) …have arbitrary contents and attributes, The tricky cases are those in which one server is
(2) …not be linked into the file tree, faulty and the other two disagree on the result. It would
(3) …not have children that it actually has, be ideal to somehow prevent these cases from ever
(4) …have children that do not actually exist, or occurring, but this is provably impossible [23]. As Table
(5) …have another tainted file as a child or parent. 1 shows, for each case (faulty object, faulty source, and
In addition, the weakened liveness semantics allow faulty destination), refinement can select a satisfactory
operations involving a tainted file to not complete. semantic result if the other servers disagree in a particular
The modifications to Farsite’s distributed-system spec way (the a subcases) but not if they disagree in the
are far more involved. Some of the key principles include opposite way (the b subcases).
maintaining redundant information across BFT group For example, in case 1, the object server is faulty. In
boundaries, augmenting messages with information that subcase a, the source believes the move to be successful
justifies their correctness, ensuring unambiguous chains so it unlinks the object, but the destination believes the
of authority over data, and carefully ordering messages move to be unsuccessful so it fails to link the object.
and state updates for operations involving more than two Consequently, the object becomes unlinked from the file
BFT groups. We illustrate an example in the next section. tree. However, safety weakness 2 (see previous section)
states that a tainted file may appear to not be linked into
6 BFI EXAMPLE: MOVE OPERATION the file tree. Thus, our refinement could interpret the
outcome either as a tainted file successfully moving and
Our BFI modifications to the distributed-system spec
then not appearing linked into the tree or as a tainted file
are too extensive to permit even a high-level summary in
failing to move and then not appearing linked into the
this paper. Nonetheless, to convey the flavor of these
tree.
modifications, we outline one example of how we
The analysis of the a subcases for cases 2 and 3 is
modified Farsite’s distributed-system spec to satisfy the
similar albeit slightly more complex, because 2a must be
semantic spec. The example exploits a redundancy that
interpreted as a failed move and 3a must be interpreted as
we added for BFI, namely that parent-child links are
a successful move, to be consistent with the safety
stored by both parent and child files. If a client observes
weaknesses allowed by the failure semantics.
inconsistent information about the link between a parent
In the b subcases, no semantic result is consistent
and a child, the client regards that link as nonexistent.
with the distributed-system outcome. For example, in
The specific example we present is a component of
subcase 1b, the destination links the object but the source
the most complex file-system operation, move, whose
does not unlink it. If the refinement were to interpret this
semantics are that an object file has its parent changed
as a successful move, the destination file would become
from a source file to a destination file, thereby changing
the object’s parent, but because the source thinks the
the full pathnames of all descendents of the object file.
move failed, it still believes that it is the object’s parent,
The object, source, and destination files might be
so the object could pretend that the source is in fact its
managed by three separate servers, each of which may be
parent, which our failure semantics disallow. A similar
faulty or not. We will not present our full case analysis
argument holds for interpreting the result as a failed
here, but we will describe the highlights.
move. Since it is impossible to refine the subcase b
If all non-faulty servers agree on whether the move
outcomes, Farsite must prevent them.
succeeds, the refinement defines the semantic result as
Our protocol precludes the b subcases by ensuring
the result obtained by the non-faulty servers. Detailed
that the non-faulty servers observe certain ordering
Case Object Source Dest. Semantic constraints before declaring a move operation successful:
a faulty success failure either The object server does not commit until after the source
1 server commits, and the destination server does not
b faulty failure success none
commit until after the source and object servers commit.
a success faulty failure failure Examination of the table shows that this prevents the
2
b failure faulty success none problematic subcases. For example, subcase 1b cannot
a failure success faulty success occur because the destination will not declare success
3
b success failure faulty none until it first hears that the source has declared success,
Table 1: Fault case analysis for move operation which it has not.
7 SUMMARY [4] M. Barborak, M. Malek. “The Consensus Problem in
Fault-Tolerant Computing.” ACM Computing Surveys,
Although Byzantine Fault Tolerance (BFT) allows a 25(2), 1993.
trusted service to run on a peer-to-peer system of [5] G. Bauer, H. Kopetz, W. Steiner. “Byzantine Fault
untrusted machines, it does not support scaling up to Containment in TTP/C.” Workshop on Real-Time LANs
increase system throughput. Scale-up can be achieved by in the Internet Age, 2002.
partitioning a workload among multiple BFT groups, but [6] M. Castro, B. Liskov, “Practical Byzantine Fault
this leads to an increase in the probability of total system Tolerance.” 3rd OSDI, 1999.
failure as the system scale increases. [7] T. D. Chandra, S. Toueg. “Unreliable Failure Detectors
To solve this problem, we introduced Byzantine Fault for Asynchronous Systems.” JACM 43(2), 1996.
Isolation (BFI), a methodology for using formal [8] M. Demirbas, A. Arora, T. Nolte, N. Lynch. “A
specification to constrain the semantic behavior of a Hierarchy-Based Fault-Local Stabilizing Algorithm for
faulty system. BFI modifies a system design to formally Tracking in Sensor Networks.” 8th OPODIS, 2004.
recognize that machines may become corrupt, wherein [9] U. Engberg, P. Grønning, L. Lamport. “Mechanical
they have undefined local state and can send arbitrary Verification of Concurrent Systems with TLA.” 4th
messages to other machines. These modifications Computer Aided Verification, 1992.
highlight areas that require design changes to restrict the [10] S. Ghosh, A. Gupta, T. Herman, S. V. Pemmaraju. “Fault-
spread of faulty information. Containing Self-Stabilizing Algorithms.” PODC, 1996.
Even in the presence of design features that restrict [11] M. P. Herlihy, J. M. Wing. “Specifying Graceful
faults, corrupt machines may still affect the system’s Degradation.” IEEE TPDS 2(1), 1991.
semantics. Thus, the BFI technique involves modifying [12] K. P. Kihlstrom, L. E. Moser, P. M. Melliar-Smith. “The
SecureRing Protocols for Securing Group
the system’s defined semantics to reflect the partial
Communication.” Hawaii International Conference on
correctness achievable when the system is degraded. BFI
System Sciences, 1998.
uses formal specification to ensure that the modified
[13] A. W. Krings, M. A. McQueen. “A Byzantine Resilient
system design satisfies the modified system semantics.
Approach to Network Security.” 29th International
We quantified the benefit of BFI to scalable systems: Symposium on Fault-Tolerant Computing, 1999.
In a tree-structured system of 105 BFT groups, wherein a [14] L. Lamport, R. Shostak, M. Pease. “The Byzantine
faulty group can corrupt its descendents’ operations, BFI Generals Problem.” ACM TPLS, 4(3), 1982.
can enable 4-member BFT groups to achieve the same [15] L. Lamport. Specifying Systems. Addison-Wesley, 2003.
operational fault rate as 10-member BFT groups, without [16] C.S. Lewis, J. Saia. “Scalable Byzantine Agreement.”
the corresponding 60% drop in throughput due to Technical Report, University of New Mexico, 2004.
increased replication and message traffic. [17] D. Malkhi, Y. Mansour, M. K. Reiter, “On Diffusing
We employed the BFI technique in the design of the Updates in a Byzantine Environment.” 18th SRDS, 1999.
Farsite distributed file system, a large and complex peer- [18] T. Masuzawa, S. Tixeuil. “A Self-Stabilizing Link-
to-peer system designed to scale to 105 machines. BFI Coloring Protocol Resilient to Unbounded Byzantine
guided us toward adding specific redundancies, enriching Faults in Arbitrary Networks.” Laboratoire de Recherche
messages, restricting authority, and constraining the order en Informatique Report #1396, 2005.
of distributed operation steps. Using BFI, we related [19] M. G. Merideth. “Enhancing Survivability with Proactive
these design changes to the system’s semantics, thereby Fault-Containment.” DSN Student Forum, 2003.
showing that file corruption cannot spread to unrelated [20] Y. M. Minsky, F. B. Schneider. “Tolerating Malicious
regions of the file-system namespace. Gossip.” Distributed Computing, 16(1), 2003.
Prior to BFI, no technique has addressed how to [21] M. Nesterenko, A. Arora. “Tolerance to Unbounded
interconnect multiple BFT groups in a way that isolates Byzantine Faults.” SRDS, 2002.
[22] V. Pappas, B. Zhang, A. Terzis, L. Zhang, “Fault-
Byzantine faults.
Tolerant Data Delivery for Multicast Overlay Networks.”
ICDCS, 2004.
REFERENCES [23] M. Pease, R. Shostak, L. Lamport. “Reaching Agreement
[1] M. Abd-El-Malek, G. Ganger, G. Goodson, M. Reiter, J. in the Presence of Faults.” JACM 27(2), 1980.
Wylie. “Fault-scalable Byzantine Fault-Tolerant [24] M. K. Reiter. “The Rampart Toolkit for Building High-
Services.” 20th SOSP, 2005. Integrity Services.” TPDS (LNCS 938), 1995.
[2] A. Adya, W. J. Bolosky, M. Castro, G. Cermak, R. [25] Y. Sakurai, F Ooshita, T. Masuzawa. “A Self-stabilizing
Chaiken, J. R. Douceur, J. Howell, J. R. Lorch, M. Link-Coloring Protocol Resilient to Byzantine Faults in
Theimer, R. P. Wattenhofer. “FARSITE: Federated, Tree Networks.” 8th OPODIS, 2004.
Available, and Reliable Storage for an Incompletely [26] F. Schneider. “Implementing Fault-Tolerant Services
Trusted Environment.” 5th OSDI, 2002. Using the State Machine Approach: A Tutorial.” ACM
[3] Y. Azar, S. Kutten, B. Patt-Shamir. “Distributed Error Computing Surveys, 22(4), 1990.
Confinement.” PODC, 2003.

Flex BFT Ccs19
No ratings yet
Flex BFT Ccs19
13 pages
Practical Byzantine Fault Tolerance and Proactive Recovery
No ratings yet
Practical Byzantine Fault Tolerance and Proactive Recovery
64 pages
Byzantine Gault Tolerance
No ratings yet
Byzantine Gault Tolerance
14 pages
Byzantine Problem and Solutions
No ratings yet
Byzantine Problem and Solutions
16 pages
Practical Byzantine Fault Tolerance
No ratings yet
Practical Byzantine Fault Tolerance
4 pages
Distributed Directory Service in The Farsite File System
No ratings yet
Distributed Directory Service in The Farsite File System
14 pages
Lecture 1.2.3 - Byzantine Fault-Tolerant Distributed Computing
No ratings yet
Lecture 1.2.3 - Byzantine Fault-Tolerant Distributed Computing
14 pages
Byzantine Replication Under Attack
No ratings yet
Byzantine Replication Under Attack
10 pages
Asynchronous Byzantine Fault Tolerant Systems
No ratings yet
Asynchronous Byzantine Fault Tolerant Systems
16 pages
Overcoming Byzantine Failures Using Checkpointing
No ratings yet
Overcoming Byzantine Failures Using Checkpointing
11 pages
Automatic Reconfiguration For Large-Scale Reliable Storage Systems
No ratings yet
Automatic Reconfiguration For Large-Scale Reliable Storage Systems
13 pages
Chapter 8-Fault Tolerance
100% (1)
Chapter 8-Fault Tolerance
71 pages
美联储 Heraclius：一个具有现代支付系统潜力的拜占庭容错数据库系统（英） 2025 15页
No ratings yet
美联储 Heraclius：一个具有现代支付系统潜力的拜占庭容错数据库系统（英） 2025 15页
16 pages
Distributed Systems Resilience
No ratings yet
Distributed Systems Resilience
25 pages
CSC423 - Lec12 - Distributed and Parallel ComputerSystems
No ratings yet
CSC423 - Lec12 - Distributed and Parallel ComputerSystems
28 pages
Agreement
No ratings yet
Agreement
5 pages
Fault Tolerance in Distributed Systems
No ratings yet
Fault Tolerance in Distributed Systems
11 pages
Distributed File Systems Overview
No ratings yet
Distributed File Systems Overview
7 pages
Application-Aware Byzantine Fault Tolerance
No ratings yet
Application-Aware Byzantine Fault Tolerance
6 pages
Review On Byzantine Fault Tolerant System
No ratings yet
Review On Byzantine Fault Tolerant System
3 pages
Steward: Scaling Byzantine Fault-Tolerant Replication To Wide Area Networks
No ratings yet
Steward: Scaling Byzantine Fault-Tolerant Replication To Wide Area Networks
24 pages
Byzantine Fault Tolerance, From Theory To Reality: Euhqgdqkdoonhylqgulvfroosklom) Xpvwhj'#+Rqh/Zhoofrp Vlylv#Frpsxwhuruj
No ratings yet
Byzantine Fault Tolerance, From Theory To Reality: Euhqgdqkdoonhylqgulvfroosklom) Xpvwhj'#+Rqh/Zhoofrp Vlylv#Frpsxwhuruj
14 pages
Fault Tolerance in Distributed Systems
No ratings yet
Fault Tolerance in Distributed Systems
9 pages
Byzantine Fault Tolerance Presentation
No ratings yet
Byzantine Fault Tolerance Presentation
11 pages
WRL0004 TMP
No ratings yet
WRL0004 TMP
9 pages
PBFT Algorithm
No ratings yet
PBFT Algorithm
76 pages
Byzantine Fault Tolerance (BFT)
No ratings yet
Byzantine Fault Tolerance (BFT)
3 pages
3-Byzantine General Problem and Fault Tolerance Hadoop Distributed File System Distributed Hash Table-17!12!202
No ratings yet
3-Byzantine General Problem and Fault Tolerance Hadoop Distributed File System Distributed Hash Table-17!12!202
14 pages
Fault Tolerance Slides
No ratings yet
Fault Tolerance Slides
18 pages
Fault Tolerance Slides
No ratings yet
Fault Tolerance Slides
18 pages
w9s1 FaultTolerance1
No ratings yet
w9s1 FaultTolerance1
34 pages
Fault Tolerance:-: Introduction, Process Resilience, Distributed Commit, Recovery
No ratings yet
Fault Tolerance:-: Introduction, Process Resilience, Distributed Commit, Recovery
52 pages
Thesis
No ratings yet
Thesis
172 pages
Fault Tolerance
No ratings yet
Fault Tolerance
40 pages
BFT State Machine Replication Guide
No ratings yet
BFT State Machine Replication Guide
68 pages
Research Article A Simple Byzantine Fault-Tolerant Algorithm For A Multi-Writer Regular Register
No ratings yet
Research Article A Simple Byzantine Fault-Tolerant Algorithm For A Multi-Writer Regular Register
13 pages
Blockchain - Unit1
No ratings yet
Blockchain - Unit1
115 pages
Failover In-Depth
No ratings yet
Failover In-Depth
4 pages
13 Sosp03 PDF
No ratings yet
13 Sosp03 PDF
15 pages
Dynamic Byzantine Storage Systems
No ratings yet
Dynamic Byzantine Storage Systems
6 pages
Distributed Systems: Consensus & Fault Tolerance
No ratings yet
Distributed Systems: Consensus & Fault Tolerance
10 pages
Fault Tolerance and Consensus: C. Bettini - Distributed and Pervasive Systems
No ratings yet
Fault Tolerance and Consensus: C. Bettini - Distributed and Pervasive Systems
10 pages
The Fault Detection Problem
No ratings yet
The Fault Detection Problem
15 pages
Du3 1
No ratings yet
Du3 1
54 pages
A Survey of Fault-Tolerance and Fault-Recovery Techniques in Parallel Systems
No ratings yet
A Survey of Fault-Tolerance and Fault-Recovery Techniques in Parallel Systems
13 pages
Containing Byzantine Failures With Control Zones: Disadvantages of Existing System
No ratings yet
Containing Byzantine Failures With Control Zones: Disadvantages of Existing System
4 pages
Unit 3
No ratings yet
Unit 3
62 pages
OS Security & Fault Tolerance
No ratings yet
OS Security & Fault Tolerance
24 pages
Gdacasd
No ratings yet
Gdacasd
2 pages
CSAR-2: A Case Study of Parallel File System Dependability Analysis
No ratings yet
CSAR-2: A Case Study of Parallel File System Dependability Analysis
14 pages
Lesson 2 - Fault and Error Modelling
No ratings yet
Lesson 2 - Fault and Error Modelling
7 pages
DC - Unit Iv - Consensus and Recovery Notes
No ratings yet
DC - Unit Iv - Consensus and Recovery Notes
33 pages
DS Chapter 8-Fault Tolerance
No ratings yet
DS Chapter 8-Fault Tolerance
68 pages
UNIT 4 DC Final
No ratings yet
UNIT 4 DC Final
38 pages
Fault Tolerance in Distributed Computing
No ratings yet
Fault Tolerance in Distributed Computing
32 pages
Week 04
No ratings yet
Week 04
49 pages
2023 Applied
No ratings yet
2023 Applied
7 pages
Failure Model
No ratings yet
Failure Model
14 pages
State Machine Replication Guide
No ratings yet
State Machine Replication Guide
8 pages
LinuxMint User-Guide For Version 17.2
No ratings yet
LinuxMint User-Guide For Version 17.2
47 pages
Dynamic Swarm Management For Improved Bittorrent Performance
No ratings yet
Dynamic Swarm Management For Improved Bittorrent Performance
6 pages
Beyond The Boundary - Mirai Kuriyama Papercraft
No ratings yet
Beyond The Boundary - Mirai Kuriyama Papercraft
1 page
Strange Bedfellows: Community Identification in Bittorrent
No ratings yet
Strange Bedfellows: Community Identification in Bittorrent
5 pages
Lip: A Lifetime and Popularity Based Ranking Approach To Filter Out Fake Files in P2P File Sharing Systems
No ratings yet
Lip: A Lifetime and Popularity Based Ranking Approach To Filter Out Fake Files in P2P File Sharing Systems
6 pages
Nikon F75 Manual
No ratings yet
Nikon F75 Manual
67 pages
Digital Manga Techniques
100% (9)
Digital Manga Techniques
130 pages
Minions Despicable Me Arsenal Version
100% (2)
Minions Despicable Me Arsenal Version
4 pages
Nikon FM2 INSTRUCTION MANUAL
No ratings yet
Nikon FM2 INSTRUCTION MANUAL
50 pages
Minions Papercraft Instructions
100% (2)
Minions Papercraft Instructions
4 pages
Pentax Spotmatic II User Guide
No ratings yet
Pentax Spotmatic II User Guide
51 pages
Minions PaperCraft (Small Size)
100% (5)
Minions PaperCraft (Small Size)
2 pages
K1000UM
100% (1)
K1000UM
33 pages
Enforcing Fair Sharing of Peer-to-Peer Resources
No ratings yet
Enforcing Fair Sharing of Peer-to-Peer Resources
6 pages
Freedman La Groups 06
No ratings yet
Freedman La Groups 06
6 pages
The Importance of History in A Media Delivery System: Richard J. Dunn, Steven D. Gribble, Henry M. Levy, John Zahorjan
No ratings yet
The Importance of History in A Media Delivery System: Richard J. Dunn, Steven D. Gribble, Henry M. Levy, John Zahorjan
6 pages
Coral: Scalable Peer-to-Peer Content Distribution
No ratings yet
Coral: Scalable Peer-to-Peer Content Distribution
6 pages
A Middleware For Gossip Protocols
No ratings yet
A Middleware For Gossip Protocols
5 pages
Controlling The Cost of Reliability in Peer-to-Peer Overlays
No ratings yet
Controlling The Cost of Reliability in Peer-to-Peer Overlays
6 pages
Arpeggio: Metadata Searching and Content Sharing With Chord
No ratings yet
Arpeggio: Metadata Searching and Content Sharing With Chord
6 pages
A Measurement Study of A Peer-to-Peer Video-on-Demand System
No ratings yet
A Measurement Study of A Peer-to-Peer Video-on-Demand System
6 pages
Netprofiler: Profiling Wide-Area Networks Using Peer Cooperation
No ratings yet
Netprofiler: Profiling Wide-Area Networks Using Peer Cooperation
6 pages
Enforcing Routing Consistency in Structured Peer-to-Peer Overlays: Should We and Could We?
No ratings yet
Enforcing Routing Consistency in Structured Peer-to-Peer Overlays: Should We and Could We?
6 pages
High Availability in DHTS: Erasure Coding vs. Replication: Rodrigo Rodrigues and Barbara Liskov Mit
No ratings yet
High Availability in DHTS: Erasure Coding vs. Replication: Rodrigo Rodrigues and Barbara Liskov Mit
6 pages
Practical Locality-Awareness For Large Scale Information Sharing
No ratings yet
Practical Locality-Awareness For Large Scale Information Sharing
6 pages
A Taxonomy of Rational Attacks: Seth James Nielson Scott A. Crosby Dan S. Wallach
No ratings yet
A Taxonomy of Rational Attacks: Seth James Nielson Scott A. Crosby Dan S. Wallach
6 pages
Overcite: A Cooperative Digital Research Library
No ratings yet
Overcite: A Cooperative Digital Research Library
6 pages
DHT-Based Service Placement Analysis
No ratings yet
DHT-Based Service Placement Analysis
6 pages
Feedtree: Sharing Web Micronews With Peer-To-Peer Event Notification
No ratings yet
Feedtree: Sharing Web Micronews With Peer-To-Peer Event Notification
6 pages
A Statistical Theory of Chord Under Churn: Abstract
No ratings yet
A Statistical Theory of Chord Under Churn: Abstract
6 pages
Performance Analysis and Redistribution Among Ripv2, Eigrp & Ospf Routing Protocol
No ratings yet
Performance Analysis and Redistribution Among Ripv2, Eigrp & Ospf Routing Protocol
4 pages
Resume - Jerry Frank John Bert
No ratings yet
Resume - Jerry Frank John Bert
2 pages
RJS-SDE-2 Spec Sheet
No ratings yet
RJS-SDE-2 Spec Sheet
3 pages
Efficient Email Management for Teams
No ratings yet
Efficient Email Management for Teams
12 pages
VPN Top 100 Interview Questions and Answers: Ques 1. What Is VPN?
100% (1)
VPN Top 100 Interview Questions and Answers: Ques 1. What Is VPN?
27 pages
ATCommandSet StepIII
No ratings yet
ATCommandSet StepIII
70 pages
Icac Act 1005
No ratings yet
Icac Act 1005
3 pages
SMPP Protocol Overview & Applications
No ratings yet
SMPP Protocol Overview & Applications
44 pages
DS 2CD2T23G0 I520180404aawrc12389314 - 20221006123632
No ratings yet
DS 2CD2T23G0 I520180404aawrc12389314 - 20221006123632
26 pages
1 Dfca
No ratings yet
1 Dfca
4 pages
All Units Coa (3140707) Question Bank S 2025
100% (1)
All Units Coa (3140707) Question Bank S 2025
3 pages
Giu 2724 62 15996 2024-02-19T17 40 13
No ratings yet
Giu 2724 62 15996 2024-02-19T17 40 13
41 pages
ProPlan CMF 3.0.1 - Release - Notes
No ratings yet
ProPlan CMF 3.0.1 - Release - Notes
8 pages
CN-2 LAB Manual 2023-24
No ratings yet
CN-2 LAB Manual 2023-24
30 pages
Colab Tutorial
No ratings yet
Colab Tutorial
21 pages
Unit 5 - Data Communication - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Data Communication - WWW - Rgpvnotes.in
12 pages
Teks Procedur
No ratings yet
Teks Procedur
4 pages
HMC Desktop
No ratings yet
HMC Desktop
22 pages
Tutorial How To Generate TSX Files Nataliapc - makeTSX Wiki GitHub
No ratings yet
Tutorial How To Generate TSX Files Nataliapc - makeTSX Wiki GitHub
12 pages
Reliable Data Transfer Basics
No ratings yet
Reliable Data Transfer Basics
31 pages
How To Increase The Size of A Vdisk and Filesystem in A LDom Guest Domain Doc ID 1549604.1
No ratings yet
How To Increase The Size of A Vdisk and Filesystem in A LDom Guest Domain Doc ID 1549604.1
15 pages
HCL: Global IT Leader & Innovations
100% (2)
HCL: Global IT Leader & Innovations
5 pages
Term1 g6
No ratings yet
Term1 g6
2 pages
Learn Active Directory
No ratings yet
Learn Active Directory
37 pages
13 Iveco
No ratings yet
13 Iveco
103 pages
LPIC 1 2 Full Brochures July2014 No Date
No ratings yet
LPIC 1 2 Full Brochures July2014 No Date
3 pages
LabSolutions AOC-6000 Support Kit Installation Troubleshooting
No ratings yet
LabSolutions AOC-6000 Support Kit Installation Troubleshooting
4 pages
Dawicontrol Raid
No ratings yet
Dawicontrol Raid
64 pages
HCNA-Security-CBSN Lab Guide V2.5
No ratings yet
HCNA-Security-CBSN Lab Guide V2.5
127 pages
Compatibility List First Page of Section: Tape Drive and Library Compatibility Matrix Oracle Secure Backup 10.4.0.1
No ratings yet
Compatibility List First Page of Section: Tape Drive and Library Compatibility Matrix Oracle Secure Backup 10.4.0.1
12 pages

Byzantine Fault Isolation in The Farsite Distributed File System

Uploaded by

Byzantine Fault Isolation in The Farsite Distributed File System

Uploaded by

Byzantine Fault Isolation in the Farsite Distributed File System

John R. Douceur and Jon Howell

ABSTRACT We have therefore developed a methodology for

You might also like