Distributed File Systems
(II)
Outline
Last topics:
Introduction
Design of distributed file systems
Implementation of distributed file systems
Now:
Case studies: NFS, AFS
Suns Network File System (NFS)
NFS is a popular and widely used network file system
NFS was originally designed and implemented by
Sun Microsystems for use on its UNIX-based
workstations
Other manufacturers now support it as well, for both
UNIX and other operating systems (including Linux,
MS-DOS, etc.)
NFS supports heterogeneous systems, for example,
MS-DOS clients making use of UNIX servers
It is not even required that all the machines use the
same hardware
Suns Network File System (NFS)
Three aspects of NFS are of interest:
architecture
protocol
implementation
NFS Architecture
The basic idea behind NFS is to allow an arbitrary collection
of clients and servers to share a common file system
In most cases, all the clients and servers are on the same LAN
NFS allows every machine to be both a client and a server at
the same time
Server side:
Each NFS server exports one or more of its directories for
access by remote clients. When a directory is made
available, so are all of its sub-directories, so the entire
directory tree is exported as a unit
The list of directories a server exports is maintained in the
/etc/exports file, so these directories can be exported
5
automatically whenever the server is booted
NFS Architecture (cont.)
Client (workstation) side:
Clients access exported directories by mounting them.
When a client mounts a directory, it becomes part of its
directory hierarchy
A diskless workstation can mount a remote file system
on its root directory, resulting in a file system that is
supported entirely on a remote server
Those workstations that have a local disk can mount
remote directories anywhere they wish. There is no
difference between a remote file and a local file
If two or more clients mount the same directory at the
same time, they can communicate by sharing files in
their common directories
NFS Protocols
A protocol is a set of requests sent by clients to
servers, along with the corresponding replies
sent by the servers back to the clients
As long as a server recognizes and can handle all the
requests in the protocols, it need not know anything at
all about its clients
Clients can treat servers as black boxes that accepts
and process a specific set of requests; how they do it is
their own business
NFS defines 2 protocols:
the protocol for mounting volumes
the protocol for directory and file access
NFS Protocols : Mounting
Mounting protocol:
A client can send a path name to a server and request
permission to mount that directory somewhere in its directory
hierarchy.
The place where it is to be mounted is not contained in the
message, as the server does not care where it is to be
mounted.
If the path name is legal and the directory specified has been
exported, the server returns a file handle to the client.
The file handle contains fields uniquely identifying the file
system type, the disk, the i-node number of the directory, and
security information.
Subsequent calls to read and write files in the mounted
directory use the file handle.
Mount Protocol
NFS uses the mount protocol to access remote files
Mount protocol establishes a local name for remote files
Users access remote files using local names; OS takes care of the mapping
Automounting
Suns version of UNIX also supports automounting
This feature allows a set of remote directories to be
associated with a local directory
None of these remote directories are mounted (or their
servers even contacted) when the client is booted
Instead, the first time a remote file is opened, the
operating system sends a message to each of the
servers. The first one to reply wins, and its directory is
mounted
10
NFS Automounting
Automounting has two principal advantages over static mounting:
First, in static mounting via the /etc/rc file, if one of the NFS servers
happens to be down, it is impossible to bring the client up -- at least not
without some difficulty, delay, and quite a few error messages
Second, by allowing the client to try a set of servers in parallel, a degree of
fault tolerance can be achieved (because only one of them need to be up),
and the performance can be improved (by choosing the first one to reply -presumably the least heavily loaded)
On the other hand, it is assumed that all the file systems specified
as alternatives for the automount are identical
Since NFS provides no support for file or directory replication, it is up to the
user to arrange for all the file systems to be the same
Thus, automounting is most often used for read-only file systems
containing system binaries and other files that rarely change
11
NFS Protocols: Directory and File Access
Clients can send messages to servers to manipulate directories
and to read and write files. They can also access file attributes,
such as file mode, size, and time of last modification. Most UNIX
system calls are supported by NFS.
In NFS, each message is self-contained
The advantage of this scheme is that the server does not have to remember
anything about open connections in between calls to it. Thus, if a server
crashes and then recovers, no information about open files is lost, because
there is none.
A server like this that does not maintain state information
about open files is said to be stateless server
In contrast, in UNIX System V, the Remote File System (RFS)
requires a file to be opened before it can be read or written.
The server then makes a table entry keeping track of the file is open, and
where the reader currently is, so each request need not carry an offset.
The disadvantage of this scheme is that if a server crashes and then
quickly reboots, all open connections are lost, and client programs fails.
12
File System Operations (1)
An incomplete list of file system operations supported by NFS
13
File System Operations (2)
An incomplete list of file system operations supported by NFS
14
NFS Protocols: Directory and File Access
The NFS scheme makes it difficult to achieve the exact
UNIX file semantics.
In UNIX, a file can be opened and locked so that other
processes cannot access it.
When the file is closed, the locks are released.
In a stateless server such as NFS, locks cannot be
associated with open files, because the server does
not know which files are open. NFS therefore needs a
separate, additional mechanism to handle locking.
15
NFS Protocols: Directory and File Access
NFS uses the UNIX protection mechanism, with rwx bits for
the owner, group, and others.
Originally, each request message simply contained the user
and group ids of the caller, which the NFS server used to
validate the access.
Currently, public key cryptography can be used to establish
a secure key for validating the client and server on each
request and reply.
In effect, it trusted the clients not to cheat.
When this option is enabled, a malicious client cannot impersonate
another client because it does not know that clients secret key.
As an aside, cryptography is used only to authenticate the
parties. The data themselves are never encrypted.
16
Network Information Service (NIS)
All the keys used for the authentication, as well as other information are
maintained by the NIS (Network Information Service)
The NIS was formerly known as the yellow pages
Its function is to store (key, value) pairs
When a key is provided, it returns the corresponding value.
Not only does it handle encryption keys, but it also stores the mapping of
user names to (encrypted) passwords, as well as the mapping of machine
names to network addresses, and other items.
The network information servers are replicated using a master/slave
arrangement
To read their data, a process can use either the master or any of the copies
in the slaves.
However, all changes must be made only to the master, which then
propagates them to the slaves.
There is a short interval after an update in which the NIS server is
inconsistent.
17
Implementation: NFS Layer Structure
18
NFS Implementation
It consists of three layers:
System call layer:
This handles calls like OPEN, READ, and CLOSE.
Virtual file system (VFS):
The task of the VFS layer is to maintain a table with one entry for
each open file, analogous to the table of I-nodes for open files in
UNIX. VFS layers has an entry, called a v-node (virtual i-node) for
every open file telling whether the file is local or remote.
NFS client code:
Used to create an r-node (remote i-node) in its internal tables to hold
the file handles. The v-node points to the r-node. Each v-node in the
VFS layer will ultimately contain either a pointer to an r-node in the
NFS client code, or a pointer to an i-node in the local operating
system. Thus from the v-node it is possible to see if a file or directory
19
is local or remote, and if it is remote, to find its file handle.
NFS Implementation (cont.)
Use client caching to improve the performance:
Transfer between client and server are done in large
chunks, normally 8 Kbytes, even if fewer bytes are
requested. This is known as read ahead.
The same for writes, if a write system call writes fewer
than 8 Kbytes, the data are just accumulated locally.
Only when the entire 8K chunk is full is it sent to the
server. However, when a file is closed, all of its data
are sent to the server immediately.
20
NFS Implementation (cont.)
Client caching improves performance
Problem: 2 clients caching the same file block and that one of them
modifies it. When the other one reads the block, it gets the old value.
Solutions:
Solution 1:
Associate with each cache block a timer, when the timer expires, the
entry is discarded. Normally, the timer is 3 sec. for data blocks and
30 sec. for directory block.
Solution 2:
Whenever a cached file is open, a message is sent to the server to
find out when the file was last modified.
If the last modification occurred after the local copy was cached, the
cached copy is discarded and the new copy fetched from the server.
Finally once every 30 sec. a cache timer expires, and all the dirty 21
blocks in the cache are sent to the server.
NFS Implementation (cont.)
Criticism:
NFS has been widely criticized for not implementing
the proper UNIX semantics
A write to a file on one client may or may not be
seen when another client reads the file, depending
on the timing
When a file is created, it may not be visible to the
outside world for as much as 30 sec.
22
NFS Implementation (cont.)
Lessons learned:
Workstations have cycles to burn, so do it on the
client-side, not the server-side
Cache whenever possible
Exploit the usage properties
Minimize systemwide knowledge and change
Trust the fewest possible entities
Batch work where possible
23
The Andrew File System (AFS)
A different approach to remote file access
Meant to service a large organization
Such as a university campus
Scaling is a major goal
24
Basic AFS Model
Files are stored permanently at file server
machines
Users work from workstation machines
With their own private namespace
Andrew provides mechanisms to cache users
files from shared namespace
25
Basic AFS Model (cont.)
User model of AFS use:
Sit down at any AFS workstation anywhere
Log in and authenticate who I am
Access all files without regard to which workstation Im
using
The local namespace:
Each workstation stores a few files
Mostly system programs and configuration files
Workstations are treated as generic, interchangeable
entities
26
Virtue and Vice
Vice is the system run by the file servers
Distributed system
Virtue is the protocol client workstations
use to communicate to Vice
27
Overall Architecture
System is viewed as a WAN composed
of LANs
Each LAN has a Vice cluster server
Which stores local files
But Vice makes all files available to all
clients
28
AFS Architecture Diagram
LAN
WAN
LAN
LAN
29
Caching the User Files
Goal is to offload work from servers to clients
When must servers do work?
To answer requests
To move data
Whole files cached at clients. Why? Reasons:
Minimizes communications with server
Most files used in entirety, anyway
Easier cache management problem
Requires substantial free disk space on workstations30
Doesnt address huge file problems
The Shared Namespace
An Andrew installation has global shared
namespace
All clients files are viewed in the namespace
with the same names
High degree of name and location transparency
31
How do servers provide the
namespace?
Files are organized into volumes
Volumes are grafted together into overall
namespace
Each file has globally unique ID
Volumes are stored at individual servers
But a volume can be moved from server to server
32
Finding a File
At high level, files have names
Directory translates name to unique ID
If client knows where the volume is, it simply
sends unique ID to appropriate server
33
Finding a Volume
What if you enter a new volume?
How do you find which server stores the volume?
Volume-location database stored on each server
Once information on volume is known, client
caches it
34
Making a Volume
When a volume moves from server to server,
update database
Heavyweight distributed operation
What about clients with cached information?
Old server maintains forwarding info
Also eases server update
35
Handling Cached Files : Venus
Files fetched transparently when needed
File system traps opens
Sends them to local Venus process
The Venus Daemon:
Responsible for handling single client cache
Caches files on open
Writes modified versions back on close
36
Consistency for AFS
If my workstation has a locally cached copy
of a file, what if someone else changes it?
Callbacks used to invalidate my copy
Requires servers to keep info on who caches
files
37
Write Consistency in AFS
What if I write to my cached copy of a file?
Need to get write permission from server
Which invalidates anyone elses callback
Permission obtained on open for write
Need to obtain new data at this point
Initially, written only to local copy
On close, Venus sends update to server
Server will invalidate callbacks for other copies
38
Extra mechanism to handle failures
Storage of Andrew Files
Stored in UNIX file systems
Client cache is a directory on local machine
Low-level names do not match Andrew names
39
Venus Cache Management
Venus keeps two caches
Status cache kept in virtual memory
Status
Data
For fast attribute lookup
Data cache kept on disk
40
Venus Process Architecture
Venus is a single user process
But multithreaded
Uses RPC to talk to server
RPC is built on low level datagram service
41
AFS Security
Only server/Vice are trusted here
Client machines might be corrupted
No client programs run on Vice machines
Clients must authenticate themselves to servers
Encryption used to protect transmissions
42
AFS File Protection
AFS supports access control lists
Each file has list of users who can access it
And permitted modes of access
Maintained by Vice
Used to mimic UNIX access control
AFS Read-Only Replication
For volumes containing files that are used frequently,
but not changed often (e.g. executables), AFS allows
43
multiple servers to store read-only copies