06-06798 Distributed Systems                                                    Overview
• Requirements for distributed file systems
                                                                        – transparency, performance, fault-tolerance, ...
                 Lecture 7:                                           • Design issues
          Distributed File Systems                                      – possible options, architectures
                                                                        – file sharing, concurrent updates
                                                                        – caching
                                                                      • Example
                                                                        – Sun NFS
                               Distributed Systems             1                               Distributed Systems                     2
    Characteristics of file systems                                                     File attributes
• Operations on files (=data + attributes)                                                     File length
   –   create/delete                                                                       Creation timestamp
   –   query/modify attributes                                                               Read timestamp
   –   open/close                                                                           Write timestamp
   –   read/write                                                                          Attribute timestamp
   –   access control                                                                        Reference count
                                                                                                  Owner
• Storage organisation                                                                                                        User controlled
                                                                                                 File type
   – directory structure (hierarchical, pathnames)                                         Access control list
   – metadata (file management information)
        • file attributes
        • directory structure info, etc
                               Distributed Systems             3                               Distributed Systems                     4
Distributed file system requirements                                 Distributed file system requirements
• Transparency (clients unaware of the distributed
  nature)                                                                – Concurrent file updates (changes by one client do not
                                                                           affect another)
   – access transparency (client unaware of distribution of files,
     same interface for local/remote files)                              – File replication (for load sharing, fault-tolerance)
   – location transparency (uniform file name space from any             – Heterogeneity (interface platform-independent)
     client workstation)                                                 – Fault-tolerance (continues to operate in the face of client
   – mobility transparency (files can be moved from one server             and server failures)
     to another without affecting client)                                – Consistency (one-copy-update semantics or slight
   – performance transparency (client performance not affected             variations)
     by load on service)                                                 – Security (access control)
   – scaling transparency (expansion possible if numbers of              – Efficiency (performance comparable to conventional file
     clients increase)                                                     systems)
                               Distributed Systems             5                               Distributed Systems                     6
                                                                                                                                                1
        File Service Design Options                                                                           File Service Design Options
• Stateful                                                                                             • Stateless
   – server holds information on open files, current position, file                                       – no state information held by server
     locks                                                                                                – file operations idempotent, must contain all information
   – open before access, close after                                                                        needed (longer message)
   – better performance - shorter message, read-ahead possible                                            – simpler file server design
   – server failure - lose state                                                                          – can recover easily from client or server crash
   – client failure - tables fill up                                                                      – locking requires extra lock server to hold state
   – can provide file locks
                                             Distributed Systems                             7                                        Distributed Systems               8
                                                                                                                     File server architecture
             File Service Architecture                                                    Text names
                                                                                                        Components (for openness):
                                                                                           to UFIDs
                                                                                                        • Flat file service
        Client computer                                               Server computer
                                                                                                           – operations on file contents
  Application Application                                             Directory service                    – unique file identifiers (UFIDs)
   program     program                                                                                     – translates UFIDs to file locations
                                                                                                        • Directory service
                                                    RPC                 Flat file service                  – mapping between text names to UFIDs
          Client module                                                                                 • Client module
                                                                                                           – API for file access, one per client computer
                                                                                                           – holds state: open files, positions
                                                                                                           – knows network location of flat file & directory server
       API: knows open files, positions...              UFIDs
                                                   opns on contents
                                             Distributed Systems                             9                                        Distributed Systems              10
      Flat file service RPC interface                                                                                       Access control
 • Used by client modules, not user programs                                                            • In UNIX file system
    – FileId (UFID) uniquely identifies file                                                               – access rights are checked against the access mode (read,
    – invalid if file not present or inappropriate access                                                    write, execute) in open
    – Read/Write; Create/Delete; Get/SetAttributes                                                         – user identity checked at login time, cannot be tampered with
 • No open/close! (unlike UNIX)
                                                                                                        • In distributed systems
    – access immediate with FileId
                                                                                                           – access rights must be checked at server
    – Read/Write identify starting point
                                                                                                              • RPC unprotected
 • Improved fault-tolerance                                                                                   • forging identity possible, a security risk
    – operations idempotent except Create, can be repeated (at-                                            – user id typically passed with every request (e.g. Sun NFS)
      least-once RPC semantics)
                                                                                                           – stateless
    – stateless service
                                             Distributed Systems                            11                                        Distributed Systems              12
                                                                                                                                                                            2
                                                                                                                                                         File names
                                Directory structure                                                                                 Text name (=directory pathname+file name)
       • Hierarchical                                                                                                          • hostname:local name
              – tree-like, pathnames from root                                                                                    – not mobility transparent
              – (in UNIX) several names per file (link operation)                                                              • uniform name structure (same name space for all
       • Naming system                                                                                                           clients)
              – implemented by client module, using directory service                                                          • remote mount (e.g. Sun NFS)
              – root has well-known UFID                                                                                          – remote directory inserted into local directory
              – locate file following path from root                                                                              – relies on clients maintaining consistent naming
                                                                                                                                    conventions across all clients
                                                                                                                                       • all clients must implement same local tree
                                                                                                                                       • must mount remote directory into the same local directory
                                                    Distributed Systems                                           13                                         Distributed Systems                     14
                                      Remote mount                                                                                               Directory service
         Server 1                                          Client                                       Server 2               • Directory
                 (root)                                    (root)                                               (root)
                                                                                                                                   – conventional file (client of the flat file service)
                                                                                                                                   – mapping from text names to UFIDs
          export                             ...    vmunix        usr                                           nfs            • Operations
                                                                                                                                   –   require FileId, machine readable UFID as parameter
                                 Remote                                             Remote                                         –   locate file (LookUp)
             people                             students         x      staff                             users
                                  mount                                              mount                                         –   add/delete file (AddName/UnName)
     big jon bob          ...                                                                      jim ann jane joe                –   match file names to regular expression (GetNames)
Note: The file system mounted at /usr/students in the client is actually the sub-tree located at /export/people in Server 1;
the file system mounted at /usr/staff in the client is actually the sub-tree located at /nfs/users in Server 2.
                                                    Distributed Systems                                           15                                         Distributed Systems                     16
                                          File sharing                                                                                 Example: Sun NFS (1985)
                                                                                                                                • Structure of flat file & client & directory service
 Multiple clients share the same file for read/write access.                                                                    • NFS protocol
• One-copy update semantics                                                                                                        – RPC based, OS independent (originally UNIX)
       – every read sees the effect of all previous writes                                                                      • NFS server
       – a write is immediately visible to clients who have the file                                                               – stateless (no open/close)
         open for reading                                                                                                          – no locks or concurrency control
• Problems!                                                                                                                        – no replication with updates
       – caching: maintaining consistency between several copies                                                                • Virtual file system, remote mount
         difficult to achieve                                                                                                   • Access control (user id with each request)
       – serialise access by using file locks (affects performance)                                                                – security loophole (modify RPC to impersonate user…)
       – trade-off between consistency and performance                                                                          • Client and server caching
                                                    Distributed Systems                                           17                                         Distributed Systems                     18
                                                                                                                                                                                                          3
                                     NFS architecture                                                                   File identifier (FileId)
                       Client computer                                             Server computer
                                                                                                          Simple Solution
                                                                                                          – i-node (number identifying file                      Server address       Index
                 Application Application
                  program     program
                                                                                                            within file system)
    UNIX                                                                                                  – file migration requires finding              IP address.socket           i-node number
system calls
                                                                  UNIX kernel                               and changing all FileIds
UNIX kernel          Virtual file system                                          Virtual file system     – UNIX reuses i-node numbers
                      Local                     Remote
                                                                                                            after file deleted (i-node gen. no)
                              file system
                   UNIX                      NFS                                 NFS              UNIX    NFS file handle
                    file                                                                           file
                              Other
                                            client                              server
                  system
                                                         NFS
                                                                                                 system   Virtual file system uses i-node if local, file handle if remote.
                                                       protocol
                                                                                                                      File handle
                                                                                                                                     File system identifier         i-node no.    i-node gener. no.
                                                             RPC (UDP or TCP)
                                                     Distributed Systems                           19                                      Distributed Systems                                20
                                   Caching in NFS                                                                                   Server caching
                                                                                                             • Store data in server memory
        • Indispensable for performance
                                                                                                             • Read-ahead: anticipate which pages to read
        • Caching
               – retains recently used data (file pages, directories, file
                                                                                                             • Delayed write
                 attributes) in cache                                                                            – update in cache; write to disk periodically (UNIX sync to
               – updates data in cache for speed                                                                   synchronise cache) or when space needed
                                                                                                                 – which contents seen by users depends on timing
               – block size typically 8kbytes
        • Server caching                                                                                     • Write through
                                                                                                                 – cache and write to disk (reliable, poor performance)
               – cache in server memory (UNIX kernel)
        • Client caching                                                                                     • Write on close
               – cache in client memory, local disk                                                              – write to disk only when commit received (fast but
                                                                                                                   problems with files open for a long time)
                                                     Distributed Systems                           21                                      Distributed Systems                                22
                                        Client caching
     • Potential consistency problems!                                                                                                 Summary
          – different versions, portions of files, check if copy still valid                                • File service
     • Timestamp method                                                                                         – crucial to the running of a distributed system
          – tag with latest time of validity check and modification time                                        – performance, consistency and easy recovery essential
          – copy valid if time since last check less than freshness
            interval, or modification time on server the same                                               • Design issues
          – choose freshness interval adaptively                                                                – separate flat file service from directory service and client
                                                                                                                  module
     • Reads
          – perform validity check, if not valid, request data from server,                                     – stateless for performance and fault-tolerance
            optimisations                                                                                       – caching for performance
     • Writes                                                                                                   – concurrent updates difficult with caching
          – After modification, marked as dirty and flushed                                                     – approximation of one-copy update semantics
     • Not truly one-copy update semantics...
                                                     Distributed Systems                           23                                      Distributed Systems                                24