0% found this document useful (0 votes)
6 views32 pages

#20 (File System Logging)

The lecture discusses filesystem logging as a method for crash recovery and consistency, emphasizing the importance of avoiding direct writes to on-disk data structures. It outlines the process of logging writes to a temporary log before committing them to the actual disk, which helps maintain atomicity and prevents corruption during crashes. The lecture also addresses challenges in logging, such as concurrent system calls and the need for efficient data handling, while comparing the xv6 logging method to other approaches like ordered mode in ext3/4 FS.

Uploaded by

janyusername
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views32 pages

#20 (File System Logging)

The lecture discusses filesystem logging as a method for crash recovery and consistency, emphasizing the importance of avoiding direct writes to on-disk data structures. It outlines the process of logging writes to a temporary log before committing them to the actual disk, which helps maintain atomicity and prevents corruption during crashes. The lecture also addresses challenges in logging, such as concurrent system calls and the need for efficient data handling, while comparing the xv6 logging method to other approaches like ordered mode in ext3/4 FS.

Uploaded by

janyusername
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

CSE 330:

Operating Systems

Adil Ahmad
Lecture #20: Filesystem logging (crash recovery and consistency)
FS optimization(s) recap
What is the difference between the two workloads?

Ø Spatial locality: addresses are Ø Temporal locality: addresses are


close to each other in “space” close to each other in “time”
Which of the following is actually better for disks?

Ø Spatial locality: addresses are Ø Temporal locality: addresses are


close to each other in “space” close to each other in “time”
Imagine that we must choose inode and data blocks

Ø Is this a good or a bad policy to choose blocks?


Ø Bad policy; will require slow “random” access
A better policy would be as follows

Ø Is there something else that we can optimize?


Ø Location of the inode itself, in this particular case. J
An even better policy
FS optimization: buffering in memory
Avoiding excessive I/O operations is important

§ Even with all possible disk optimizations, I/O remains slow and
unpredictable depending on workloads

§ File system will cache “files” in memory (recall how the xv6 block layer
caches “blocks”) to improve performance

§ To do so, the file system will:


Ø Reserve a region for the “file cache”
Ø Find out which files are frequently read/written
Ø Keep those files within the cache
Modern OSs take a “unified” cache approach

§ Keep a single cache for (a) file pages and (b) program memory pages

§ Avoids having multiple caches, and separately dealing with them


Buffering creates a tiny headache for the file system..

§ Imagine the following set of events from an FS:

1. Program writes to a new 1G file (e.g., bar)


2. FS caches all writes to the file in-memory
3. FS updates the inodes within the disk
4. FS writes all changes to data blocks within disk (e.g., block-by-
block using Linux’s block I/O)

§ Can you spot potential problems?


Ø If power goes off after 3, your system has already reserved the data
block à FS lost a few free data blocks
Worse problems could also occur on crashes

§ Imagine the following set of events from an FS:

1. Imagine that the FS is deleting a file X:


2. Frees block for X and changes its data bitmap
3. Before FS can delete the inode for the block, power goes off.
4. OS restarts; thinks block is free and allocates it to a new file Y.

§ Can you spot potential problems?


Ø Two inodes are pointing to the same data blocks à write to two
different files will overwrite each other

§ Can this be a security issue?


Ø Yes, imagine two different users get allocated the blocks
Crash recovery is a significant problem for FSs

§ Many filesystem operations require multiple writes to disk


§ creat (file) à 5 disk blocks to be written
§ inode, directory, and actual file data

§ A crash (e.g., power failure) after a subset of these writes might put the
filesystem in an inconsistent state

§ Note this is only a problem for write operations. Why is that?


Ø Data/Inode blocks only get changed on writes, not reads.
How would you solve this problem of FS corruption on crashes?

§ Imagine the following set of events from an FS:

1. Program writes to a new 1G file (e.g., bar)


2. FS caches all writes to the file in-memory
3. FS updates the inodes within the disk
4. FS writes all changes to data blocks within disk (e.g., block-by-
block using Linux’s block I/O)

§ Can you spot potential problems?


Ø If power goes off after 3, your system has already reserved the data
block à FS lost a few free data blocks
How would you solve this problem of FS corruption on crashes?

§ Imagine the following set of events from an FS:

1. Imagine that the FS is deleting a file X:


2. Frees block for X and changes its data bitmap
3. Before FS can delete the inode for the block, power goes off.
4. OS restarts; thinks block is free and allocates it to a new file Y.

§ Can you spot potential problems?


Ø Two inodes are pointing to the same data blocks à write to two
different files will overwrite each other
Ensuring correctness after system crash
Filesystem logging addresses crash problems

Let’s look at the high-level idea of logging:

§ Never write directly to the on-disk data structures

§ Place all writes to a “log” on the disk

§ Once all log writes in one operation are complete, issue a “commit”

§ FS writes the data to actual disk locations using “log”

§ FS removes “log” entry


Does logging prevent corruption on crashes?

Let’s look at the high-level idea of logging: Discard all writes on


reboot after crash here
§ Never write directly to the on-disk data structures

§ Place all writes to a “log” on the disk All needed data is in disk
to complete write
§ Once all log writes in one operation are complete, issue a “commit”

§ FS writes the data to actual disk locations using “log” Can recover the data using
log on crash here
§ FS removes “log” entry

Ensures atomicity: all writes in a transaction are logged or none


Where is the log in this figure BTW?

Log

Ø We must reserve it as well in some fixed part of the disk, and note its
information (e.g., starting block, size) in the superblock
Code review: xv6’s crash recovery through logging

Step #1: Signaling start of log at the write system call

Step #2: Writing data to in-memory buffers

Step #3: Tracking the buffers that must be flushed to disk

Step #4: Commit cache to log disk sectors

Step #5: Moving data from log to final disk locations


Step #1: Signaling the start of a log at write syscall

write(..) system call

… Is log being committed?


Signalling starts here

Perform the writes now Is there enough space?


Called at the end
Step #2: Writing data to in-memory buffers
Called from filewrite()


Try to find a cache to keep
block in-memory
Read block from disk

Copy data from userspace


Similar to the copy_from_user(..)
you have been using for kernel modules
Step #3: Tracking that a buffer must be written later

Log metadata structures


struct log { struct logheader {
struct logheader; int n;
… int block[LOGSIZE];
} }

Must keep metadata for log


log_write(.. Reuse if already allocated
)

Place each block into an empty log block


Step #4: Commit cache to log sectors
writei(..) filewrite(..) end_op()
Not actually ‘committed’
until we save log header
Write from cache to log
Read log header from disk

Change, then write-back


Is the log actually
‘committed’ at this point?
Step #5: Write cache to disk and delete log entry
writei(..) filewrite(..) end_op()

Read log src and


disk dst blocks

Write-back to dst

Remove the log to avoid


re-copy on crash Is the store to disk “actually
committed” at this point?
Aspects of FS logging can become challenging

§ All data at system call is first written to in-memory block cache

§ Then transferred to the log disk space inside your disk

§ Also, FS system calls can happen concurrently

Can you spot the major challenges to ensure smoothness?


Challenges of logging?

§ System call data must fit inside the log disk space

§ Recall that each write(..) puts all blocks into log regions

§ xv6’s solutions:

1. Set an upper bound of all log entries


§ Set log size >= upper bound

2. Breakdown large writes into smaller transactions


§ Problem: large writes are thus not atomic
§ However, overall system remains in a consistent state
Challenges of logging?

§ Concurrent system call handling

1. Must allow different FS system calls from different proc at the same time
2. Block may be written multiple times while still in-memory
§ Creating a new log entry for each time is wasteful

§ xv6’s solutions:

1. Check if a system call’s data can fit in log before starting it


§ Else, sleep and wait for log to free up

2. If the block is already assigned a log entry, return without creating new entry
§ Also called “write absorption” à absorb multiple writes into a single
Pros of xv6’s logging?

Ø FS internal invariants are maintained


§ Metadata remains consistent (e.g., no two inodes point to same blocks)

Ø Except for the last few operations, data is preserved on the disk
§ Every system call is written at the end before return
§ No surprises of losing data written way back

Ø Write order is preserved


§ $ echo A > X ; echo B > Y
§ X will be written before Y
Cons of xv6’s crash recovery method?

§ Several inefficiencies Design choice: ensure either old or


new copy is correct. No corruption
§ Every block is written twice (once for log + once for final)

§ Eagerly writes to disk after every write system call

§ Writes each block one-by-one to disk

Implementation choice: How would you overcome?


Keep data in-memory for longer
periods and batch disk writes
Ordered mode in ext3/4 FS avoids twice data writes

§ Log the FS metadata but write data directly to the final disk location
§ Crash consistency, but not recovery

§ High-level overview of ordered mode:


1. Metadata is committed to log
2. Data is written to final disk location
3. Metadata is moved from log to final disk location

§ Limitation?

§ Suggested reading(s):
Journaling the Linux ext2fs: https://pdos.csail.mit.edu/6.1810/2022/readings/journal-
design.pdf
Analysis and Evolution of Journaling File Systems (Prabhakaran et. al)
Questions? Otherwise, see you next class!

You might also like