0% found this document useful (0 votes)

6 views32 pages

#20 (File System Logging)

The lecture discusses filesystem logging as a method for crash recovery and consistency, emphasizing the importance of avoiding direct writes to on-disk data structures. It outlines the process of logging writes to a temporary log before committing them to the actual disk, which helps maintain atomicity and prevents corruption during crashes. The lecture also addresses challenges in logging, such as concurrent system calls and the need for efficient data handling, while comparing the xv6 logging method to other approaches like ordered mode in ext3/4 FS.

Uploaded by

janyusername

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views32 pages

#20 (File System Logging)

Uploaded by

janyusername

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

CSE 330:

Operating Systems

Adil Ahmad
Lecture #20: Filesystem logging (crash recovery and consistency)
FS optimization(s) recap
What is the difference between the two workloads?

Ø Spatial locality: addresses are Ø Temporal locality: addresses are

close to each other in “space” close to each other in “time”
Which of the following is actually better for disks?

Ø Spatial locality: addresses are Ø Temporal locality: addresses are

close to each other in “space” close to each other in “time”
Imagine that we must choose inode and data blocks

Ø Is this a good or a bad policy to choose blocks?

Ø Bad policy; will require slow “random” access
A better policy would be as follows

Ø Is there something else that we can optimize?

Ø Location of the inode itself, in this particular case. J
An even better policy
FS optimization: buffering in memory
Avoiding excessive I/O operations is important

§ Even with all possible disk optimizations, I/O remains slow and
unpredictable depending on workloads

§ File system will cache “files” in memory (recall how the xv6 block layer
caches “blocks”) to improve performance

§ To do so, the file system will:

Ø Reserve a region for the “file cache”
Ø Find out which files are frequently read/written
Ø Keep those files within the cache
Modern OSs take a “unified” cache approach

§ Keep a single cache for (a) file pages and (b) program memory pages

§ Avoids having multiple caches, and separately dealing with them

Buffering creates a tiny headache for the file system..

§ Imagine the following set of events from an FS:

1. Program writes to a new 1G file (e.g., bar)

2. FS caches all writes to the file in-memory
3. FS updates the inodes within the disk
4. FS writes all changes to data blocks within disk (e.g., block-by-
block using Linux’s block I/O)

§ Can you spot potential problems?

Ø If power goes off after 3, your system has already reserved the data
block à FS lost a few free data blocks
Worse problems could also occur on crashes

§ Imagine the following set of events from an FS:

1. Imagine that the FS is deleting a file X:

2. Frees block for X and changes its data bitmap
3. Before FS can delete the inode for the block, power goes off.
4. OS restarts; thinks block is free and allocates it to a new file Y.

§ Can you spot potential problems?

Ø Two inodes are pointing to the same data blocks à write to two
different files will overwrite each other

§ Can this be a security issue?

Ø Yes, imagine two different users get allocated the blocks
Crash recovery is a significant problem for FSs

§ Many filesystem operations require multiple writes to disk

§ creat (file) à 5 disk blocks to be written
§ inode, directory, and actual file data

§ A crash (e.g., power failure) after a subset of these writes might put the
filesystem in an inconsistent state

§ Note this is only a problem for write operations. Why is that?

Ø Data/Inode blocks only get changed on writes, not reads.
How would you solve this problem of FS corruption on crashes?

§ Imagine the following set of events from an FS:

1. Program writes to a new 1G file (e.g., bar)

2. FS caches all writes to the file in-memory
3. FS updates the inodes within the disk
4. FS writes all changes to data blocks within disk (e.g., block-by-
block using Linux’s block I/O)

§ Can you spot potential problems?

Ø If power goes off after 3, your system has already reserved the data
block à FS lost a few free data blocks
How would you solve this problem of FS corruption on crashes?

§ Imagine the following set of events from an FS:

1. Imagine that the FS is deleting a file X:

2. Frees block for X and changes its data bitmap
3. Before FS can delete the inode for the block, power goes off.
4. OS restarts; thinks block is free and allocates it to a new file Y.

§ Can you spot potential problems?

Ø Two inodes are pointing to the same data blocks à write to two
different files will overwrite each other
Ensuring correctness after system crash
Filesystem logging addresses crash problems

Let’s look at the high-level idea of logging:

§ Never write directly to the on-disk data structures

§ Place all writes to a “log” on the disk

§ Once all log writes in one operation are complete, issue a “commit”

§ FS writes the data to actual disk locations using “log”

§ FS removes “log” entry

Does logging prevent corruption on crashes?

Let’s look at the high-level idea of logging: Discard all writes on

reboot after crash here
§ Never write directly to the on-disk data structures

§ Place all writes to a “log” on the disk All needed data is in disk
to complete write
§ Once all log writes in one operation are complete, issue a “commit”

§ FS writes the data to actual disk locations using “log” Can recover the data using
log on crash here
§ FS removes “log” entry

Ensures atomicity: all writes in a transaction are logged or none

Where is the log in this figure BTW?

Log

Ø We must reserve it as well in some fixed part of the disk, and note its
information (e.g., starting block, size) in the superblock
Code review: xv6’s crash recovery through logging

Step #1: Signaling start of log at the write system call

Step #2: Writing data to in-memory buffers

Step #3: Tracking the buffers that must be flushed to disk

Step #4: Commit cache to log disk sectors

Step #5: Moving data from log to final disk locations

Step #1: Signaling the start of a log at write syscall

write(..) system call

… Is log being committed?

Signalling starts here

Perform the writes now Is there enough space?

Called at the end
Step #2: Writing data to in-memory buffers
Called from filewrite()

…
Try to find a cache to keep
block in-memory
Read block from disk

Copy data from userspace

Similar to the copy_from_user(..)
you have been using for kernel modules
Step #3: Tracking that a buffer must be written later

Log metadata structures

struct log { struct logheader {
struct logheader; int n;
… int block[LOGSIZE];
} }

Must keep metadata for log

log_write(.. Reuse if already allocated
)

Place each block into an empty log block

Step #4: Commit cache to log sectors
writei(..) filewrite(..) end_op()
Not actually ‘committed’
until we save log header
Write from cache to log
Read log header from disk

Change, then write-back

Is the log actually
‘committed’ at this point?
Step #5: Write cache to disk and delete log entry
writei(..) filewrite(..) end_op()

Read log src and

disk dst blocks

Write-back to dst

Remove the log to avoid

re-copy on crash Is the store to disk “actually
committed” at this point?
Aspects of FS logging can become challenging

§ All data at system call is first written to in-memory block cache

§ Then transferred to the log disk space inside your disk

§ Also, FS system calls can happen concurrently

Can you spot the major challenges to ensure smoothness?

Challenges of logging?

§ System call data must fit inside the log disk space

§ Recall that each write(..) puts all blocks into log regions

§ xv6’s solutions:

1. Set an upper bound of all log entries

§ Set log size >= upper bound

2. Breakdown large writes into smaller transactions

§ Problem: large writes are thus not atomic
§ However, overall system remains in a consistent state
Challenges of logging?

§ Concurrent system call handling

1. Must allow different FS system calls from different proc at the same time
2. Block may be written multiple times while still in-memory
§ Creating a new log entry for each time is wasteful

§ xv6’s solutions:

1. Check if a system call’s data can fit in log before starting it

§ Else, sleep and wait for log to free up

2. If the block is already assigned a log entry, return without creating new entry
§ Also called “write absorption” à absorb multiple writes into a single
Pros of xv6’s logging?

Ø FS internal invariants are maintained

§ Metadata remains consistent (e.g., no two inodes point to same blocks)

Ø Except for the last few operations, data is preserved on the disk
§ Every system call is written at the end before return
§ No surprises of losing data written way back

Ø Write order is preserved

§ $ echo A > X ; echo B > Y
§ X will be written before Y
Cons of xv6’s crash recovery method?

§ Several inefficiencies Design choice: ensure either old or

new copy is correct. No corruption
§ Every block is written twice (once for log + once for final)

§ Eagerly writes to disk after every write system call

§ Writes each block one-by-one to disk

Implementation choice: How would you overcome?

Keep data in-memory for longer
periods and batch disk writes
Ordered mode in ext3/4 FS avoids twice data writes

§ Log the FS metadata but write data directly to the final disk location
§ Crash consistency, but not recovery

§ High-level overview of ordered mode:

1. Metadata is committed to log
2. Data is written to final disk location
3. Metadata is moved from log to final disk location

§ Limitation?

§ Suggested reading(s):
Journaling the Linux ext2fs: https://pdos.csail.mit.edu/6.1810/2022/readings/journal-
design.pdf
Analysis and Evolution of Journaling File Systems (Prabhakaran et. al)
Questions? Otherwise, see you next class!

Kernel VFS and File System Efficiency
No ratings yet
Kernel VFS and File System Efficiency
28 pages
L Crash
No ratings yet
L Crash
6 pages
537 L22 LFS
No ratings yet
537 L22 LFS
64 pages
Ext3/4 File Systems: Don Porter CSE 506
No ratings yet
Ext3/4 File Systems: Don Porter CSE 506
33 pages
L Journal
No ratings yet
L Journal
7 pages
File System Consistency and Exam Review
No ratings yet
File System Consistency and Exam Review
43 pages
#19 (File System Internals)
No ratings yet
#19 (File System Internals)
51 pages
Ext3 Journal Design
No ratings yet
Ext3 Journal Design
8 pages
Journal Design PDF
No ratings yet
Journal Design PDF
8 pages
Buffer Cache Algorithms: Session No:5 Operating System Design @KL University, 2020
No ratings yet
Buffer Cache Algorithms: Session No:5 Operating System Design @KL University, 2020
21 pages
ext3 Journaling and File System Transactions
No ratings yet
ext3 Journaling and File System Transactions
22 pages
L20 FS Reliability
No ratings yet
L20 FS Reliability
18 pages
A Comparison of Journaling and Transactional File Systems
No ratings yet
A Comparison of Journaling and Transactional File Systems
12 pages
Outline: File System Consistency Issues in The Presence of Failures
No ratings yet
Outline: File System Consistency Issues in The Presence of Failures
4 pages
A Survey of File Systems
No ratings yet
A Survey of File Systems
2 pages
Ext3 Journaling File System: Chadd Williams Shrug 10/05/2001
No ratings yet
Ext3 Journaling File System: Chadd Williams Shrug 10/05/2001
21 pages
Crash Consistency
No ratings yet
Crash Consistency
14 pages
Lec15 Lfs
No ratings yet
Lec15 Lfs
24 pages
F2FS: A New File System For Flash Storage
No ratings yet
F2FS: A New File System For Flash Storage
15 pages
Log-Structured File System Design
No ratings yet
Log-Structured File System Design
21 pages
L Fs
No ratings yet
L Fs
6 pages
Lec7 Logging
No ratings yet
Lec7 Logging
4 pages
Ext3 Filesystem
No ratings yet
Ext3 Filesystem
1 page
13 File-Systems
No ratings yet
13 File-Systems
69 pages
LINUX File System: Slides Adopted From
No ratings yet
LINUX File System: Slides Adopted From
41 pages
Chapter 3
No ratings yet
Chapter 3
18 pages
The Virtual File System (VFS)
No ratings yet
The Virtual File System (VFS)
60 pages
Log Structure
No ratings yet
Log Structure
8 pages
Outline: Access Control Lists (ACL) : Keep Lists of Access For Each Domain With
No ratings yet
Outline: Access Control Lists (ACL) : Keep Lists of Access For Each Domain With
5 pages
Understanding UNIX / Linux File System: What Is A File?
No ratings yet
Understanding UNIX / Linux File System: What Is A File?
9 pages
13 Filesystems Slides
No ratings yet
13 Filesystems Slides
39 pages
Module 4 File System
No ratings yet
Module 4 File System
58 pages
#18 (File System Overview and Internals)
No ratings yet
#18 (File System Overview and Internals)
44 pages
CSS Finals Reviewer
No ratings yet
CSS Finals Reviewer
19 pages
5 FileSystems
No ratings yet
5 FileSystems
33 pages
Sola 10 U11 Ga x86 DVD
No ratings yet
Sola 10 U11 Ga x86 DVD
18 pages
9.2 Filesystem Implementation
No ratings yet
9.2 Filesystem Implementation
21 pages
12 File Systems
No ratings yet
12 File Systems
79 pages
Lecture 2 Advanced File Systems
No ratings yet
Lecture 2 Advanced File Systems
66 pages
Understanding Linux File Systems
No ratings yet
Understanding Linux File Systems
21 pages
File Systems
No ratings yet
File Systems
3 pages
QB Delhi Campus
No ratings yet
QB Delhi Campus
17 pages
Linux File System Structure
100% (1)
Linux File System Structure
55 pages
Operating Systems: Unit-6 I/O Management
No ratings yet
Operating Systems: Unit-6 I/O Management
40 pages
File Systems
No ratings yet
File Systems
6 pages
Scalable Flash File System Design
No ratings yet
Scalable Flash File System Design
8 pages
4a Filesystem
No ratings yet
4a Filesystem
33 pages
File Systems
100% (1)
File Systems
64 pages
Comprehensive Guide to File Systems
No ratings yet
Comprehensive Guide to File Systems
43 pages
Linux & Mac File Systems Guide
100% (1)
Linux & Mac File Systems Guide
10 pages
14 File System Implementation
No ratings yet
14 File System Implementation
46 pages
File Systems: 6.1 Files 6.2 Directories 6.3 File System Implementation 6.4 Example File Systems
No ratings yet
File Systems: 6.1 Files 6.2 Directories 6.3 File System Implementation 6.4 Example File Systems
46 pages
Operating System
No ratings yet
Operating System
3 pages
File System Implementation Guide
No ratings yet
File System Implementation Guide
46 pages
Chapter 06
No ratings yet
Chapter 06
59 pages
Files File System Core Lecture
No ratings yet
Files File System Core Lecture
36 pages
A Brief History of UNIX File Systems: Val Henson IBM, Inc
No ratings yet
A Brief History of UNIX File Systems: Val Henson IBM, Inc
22 pages
Architectural Programming: Architectural Design II Dossin
No ratings yet
Architectural Programming: Architectural Design II Dossin
7 pages
Python Questions and Answers Lists 5
No ratings yet
Python Questions and Answers Lists 5
4 pages
Siemon2017 00
No ratings yet
Siemon2017 00
12 pages
Sony CFM 20 Ver 1.0
No ratings yet
Sony CFM 20 Ver 1.0
24 pages
TRP 5000
No ratings yet
TRP 5000
112 pages
Price List 5-8 (2025-05-08 05 - 43 - 18)
No ratings yet
Price List 5-8 (2025-05-08 05 - 43 - 18)
1 page
(Ebook) Process Planning and Cost Estimation by R Kesavan C Elanchezhian B Vijaya Ramnath ISBN 9788122429411, 8122429416 Download
100% (1)
(Ebook) Process Planning and Cost Estimation by R Kesavan C Elanchezhian B Vijaya Ramnath ISBN 9788122429411, 8122429416 Download
155 pages
Cost Estimation for Engineers
No ratings yet
Cost Estimation for Engineers
30 pages
Kasas AMO Manuals Overview
100% (1)
Kasas AMO Manuals Overview
7 pages
PG3 User
No ratings yet
PG3 User
18 pages
WW Cøvgv-Bb-Bwäwbqvwis: Aa Vq-1 Acv Iwus WM ÷G
No ratings yet
WW Cøvgv-Bb-Bwäwbqvwis: Aa Vq-1 Acv Iwus WM ÷G
5 pages
The Result of Learners Who Have Own Personal Computer..........
No ratings yet
The Result of Learners Who Have Own Personal Computer..........
22 pages
Solar Refrigerator
No ratings yet
Solar Refrigerator
16 pages
CPD Record: Professional Practice Development
No ratings yet
CPD Record: Professional Practice Development
2 pages
ITI Consolidated Marksheet
No ratings yet
ITI Consolidated Marksheet
1 page
Engineering Drawing Checklist
No ratings yet
Engineering Drawing Checklist
1 page
Engineering Drawing Creation Procedure
No ratings yet
Engineering Drawing Creation Procedure
53 pages
Computer-Paper Class 2
No ratings yet
Computer-Paper Class 2
7 pages
Marketing Conclave for Students
No ratings yet
Marketing Conclave for Students
7 pages
Shelly Dimmer 2
No ratings yet
Shelly Dimmer 2
2 pages
UiTM M&E Engineering Exam
No ratings yet
UiTM M&E Engineering Exam
3 pages
CS610P Lab 1-16
100% (1)
CS610P Lab 1-16
81 pages
Concrete: History, Types, and Properties
No ratings yet
Concrete: History, Types, and Properties
23 pages
314320-An-Mathematics For Machine Learning II Sample QP
No ratings yet
314320-An-Mathematics For Machine Learning II Sample QP
3 pages
Oracle Supply Chain Management Total Steps
No ratings yet
Oracle Supply Chain Management Total Steps
5 pages
Python - Numbers
No ratings yet
Python - Numbers
37 pages
PS2001-CMS RMF DESCASE MP FILTRI NBI Auto Parts
No ratings yet
PS2001-CMS RMF DESCASE MP FILTRI NBI Auto Parts
5 pages
A Project Report On "Digital Marketing - It's Scope Types and Importance'
0% (1)
A Project Report On "Digital Marketing - It's Scope Types and Importance'
12 pages
Grade 9 - Mathematics - சுய கற்றல் கையேடு
100% (1)
Grade 9 - Mathematics - சுய கற்றல் கையேடு
40 pages
Automated Testing Lifecycle Methodology
No ratings yet
Automated Testing Lifecycle Methodology
21 pages

#20 (File System Logging)

Uploaded by

#20 (File System Logging)

Uploaded by

CSE 330:

Ø Spatial locality: addresses are Ø Temporal locality: addresses are

Ø Spatial locality: addresses are Ø Temporal locality: addresses are

Ø Is this a good or a bad policy to choose blocks?

Ø Is there something else that we can optimize?

§ To do so, the file system will:

§ Avoids having multiple caches, and separately dealing with them

§ Imagine the following set of events from an FS:

1. Program writes to a new 1G file (e.g., bar)

§ Can you spot potential problems?

§ Imagine the following set of events from an FS:

1. Imagine that the FS is deleting a file X:

§ Can you spot potential problems?

§ Can this be a security issue?

§ Many filesystem operations require multiple writes to disk

§ Note this is only a problem for write operations. Why is that?

§ Imagine the following set of events from an FS:

1. Program writes to a new 1G file (e.g., bar)

§ Can you spot potential problems?

§ Imagine the following set of events from an FS:

1. Imagine that the FS is deleting a file X:

§ Can you spot potential problems?

Let’s look at the high-level idea of logging:

§ Never write directly to the on-disk data structures

§ Place all writes to a “log” on the disk

§ FS writes the data to actual disk locations using “log”

§ FS removes “log” entry

Let’s look at the high-level idea of logging: Discard all writes on

Ensures atomicity: all writes in a transaction are logged or none

Step #1: Signaling start of log at the write system call

Step #2: Writing data to in-memory buffers

Step #3: Tracking the buffers that must be flushed to disk

Step #4: Commit cache to log disk sectors

Step #5: Moving data from log to final disk locations

write(..) system call

… Is log being committed?

Perform the writes now Is there enough space?

Copy data from userspace

Log metadata structures

Must keep metadata for log

Place each block into an empty log block

Change, then write-back

Read log src and

Remove the log to avoid

§ All data at system call is first written to in-memory block cache

§ Then transferred to the log disk space inside your disk

§ Also, FS system calls can happen concurrently

Can you spot the major challenges to ensure smoothness?

1. Set an upper bound of all log entries

2. Breakdown large writes into smaller transactions

§ Concurrent system call handling

1. Check if a system call’s data can fit in log before starting it

Ø FS internal invariants are maintained

Ø Write order is preserved

§ Several inefficiencies Design choice: ensure either old or

§ Eagerly writes to disk after every write system call

§ Writes each block one-by-one to disk

Implementation choice: How would you overcome?

§ High-level overview of ordered mode:

You might also like