ADVANCED
PROGRAMMING
TECHNIQUES
VNU - UNIVERSITY of ENGINEERING & TECHNOLOGY
LECTURE 2: Programming File I/O in Linux
CONTENTS
> Fundamental concepts
> Working with files
> File metadata operations
> Working with directories
> Bufferred I/O
File
Computers store information on various persistent storage media
The OS abstracts from the physical devices to define a file = a logical storage unit
> Linux treats everything as a file, providing a consistent way (tools
+ commands) to interact with various system resources.
▪ Regular file: a sequence of bytes stored on a medium, e.g. texts, images, etc.
▪ Any object or HW device that reads/writes bytes
✓ Terminals
✓ Pipes
✓ Sockets
File Attributes
> Standard file attributes
▪ Name: a file is a named collection of related information
▪ Permissions: read (r)/write (w)/execute (x) rights of a file
▪ Ownership: includes owner (o)/group (g)/other (o)
▪ Time stamps: e.g., atime/mtime/ctime
▪ Size
> Extended file attributes
▪ E.g., immutable, append-only, no atime updates, etc.
▪ Check extended attributes with the shell command: lsattr filename
> Special file attributes (for executables)
▪ E.g., SetUID, SetGID, Sticky Bit.
Linux File System
> Files are organized as a hierarchy anchored by root directory (/)
▪ A directory consists of an array of links, each link maps a filename to a file
▪ Each directory has at least 2 links: to itself (.) & to the parent directory (..)
Pathnames
> Denote the locations of files in the namespace hierarchy
▪ Absolute pathname denotes path from root
▪ Relative pathname denotes path from current working directory (cwd)
▪ Kernel maintains current working directory for each process
/ cwd: /home/bryant
bin/ dev/ etc/ home/ usr/
bash tty1 group passwd droh/ bryant/ include/ bin/
hello.c stdio.h sys/ vim
what is absolute/relative pathnames of hello.c?
unistd.h
File I/O Operations
> C programs usually use stdio package for file I/O.
> Library functions layered on top of I/O system calls
▪ We presume understanding of stdio → focus on system calls.
System calls Library functions
file descriptor (int) file stream (FILE *)
open(), close() fopen(), fclose()
lseek() fseek(), ftell()
read() fgets(), fscanf(), fread(), …
write() fputs(), fprintf(), fwrite(), …
File Descriptors & Open-file Table
> Linux requires that a file must be opened before its first use
▪ Linux kernel handle opened files via their descriptors (small integers) and
keeps a small table containing information about all open files.
▪ Avoid constant search of the directory every time an operation is required.
open-file table
file descriptor
(as table index)
open() foo.txt
user space kernel space secondary storage
> Each Linux shell process begins life with 3 open files
FD Purpose POSIX name stdio stream
0 Standard input STDIN_FILENO stdin
1 Standard output STDOUT_FILENO stdout
2 Standard error STDERR_FILENO stderr
Opening Files
> Informs the kernel that you are getting ready to access that file
#include <sys/stat.h>
#include <fcntl.h>
int fd; /* file descriptor */
if ((fd = open("/etc/hosts", O_RDONLY)) < 0) {
perror("open");
exit(1);
}
> Returns the file descriptor, normally a small nonegative integer
▪ Guaranteed to be lowest available FD
▪ fd == -1 indicates that an error occurred
Closing Files
> Informs the kernel that you are finished accessing that file
#include <sys/stat.h>
#include <fcntl.h>
int fd; /* file descriptor */
int retval; /* return value */
if ((retval = close(fd)) < 0) {
perror("close");
exit(1);
}
> Closing an already closed file is a recipe for disaster in
threaded programs (more on this later)
▪ Always releases FD, even on failure return.
> Always check return codes
Reading Files
> Copies bytes from the current file position to memory, and then
updates file position
#include <unistd.h>
char buf[512];
int fd; /* file descriptor */
int nbytes; /* number of bytes read */
/* Open file fd ... */
/* Then read up to 512 bytes from file fd */
if ((nbytes = read(fd, buf, sizeof(buf))) < 0) {
perror("read");
exit(1);
}
> Returns number of bytes read from file fd into buf
▪ nbytes < 0 indicates that an error occurred
▪ Short counts (nbytes < sizeof(buf) ) are possible and are not errors!
Writing Files
> Copies bytes from memory to the current file position, and then
updates current file position
#include <unistd.h>
char buf[512];
int fd; /* file descriptor */
int nbytes; /* number of bytes read */
/* Open the file fd ... */
/* Then write up to 512 bytes from buf to file fd */
if ((nbytes = write(fd, buf, sizeof(buf)) < 0) {
perror("write");
exit(1);
}
> Returns number of bytes written from buf to file fd
▪ nbytes < 0 indicates that an error occurred
▪ As with reads, short counts are possible and are not errors!
On Short Counts
> Short counts can occur in these situations:
▪ Encountering end-of-file (EOF) on reads
▪ Reading text lines from a terminal
▪ Reading and writing network sockets
> Short counts never occur in these situations:
▪ Reading from disk files (except for EOF)
▪ Writing to disk files
> Best practice is to always allow for short counts.
Seeking in a File
> Moves the current read/write
B0 B1 • • • Bk-1 Bk Bk+1 • • •
position within an open file
▪ E.g., media player seeks position in a
Current file position = k
video file
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
int fd; /* file descriptor */
/* Open the file fd ... */
/* Then move to the 10th byte from the beginning */
if (offset = lseek(fd, 10, SEEK_SET)) < 0) {
perror(“seek");
exit(1);
}
File Metadata Operations
> Per-file metadata is data about the file data maintained by kernel
▪ accessed by users with the stat/fstat functions
/* Metadata returned by the stat and fstat functions */
struct stat {
dev_t st_dev; /* Device */
ino_t st_ino; /* inode */
mode_t st_mode; /* Protection and file type */
nlink_t st_nlink; /* Number of hard links */
uid_t st_uid; /* User ID of owner */
gid_t st_gid; /* Group ID of owner */
dev_t st_rdev; /* Device type (if inode device) */
off_t st_size; /* Total size, in bytes */
unsigned long st_blksize; /* Blocksize for filesystem I/O */
unsigned long st_blocks; /* Number of blocks allocated */
time_t st_atime; /* Time of last access */
time_t st_mtime; /* Time of last modification */
time_t st_ctime; /* Time of last change */
};
Accessing Directories
> Only recommended operation on a directory: read its entries
▪ dirent structure contains information about a directory entry
▪ DIR structure contains information about the directory while walking through it
#include <sys/types.h>
#include <dirent.h>
{
DIR *directory;
struct dirent *de;
...
if (!(directory = opendir(dir_name)))
error("Failed to open directory");
...
while (0 != (de = readdir(directory))) {
printf("Found file: %s\n", de->d_name);
}
...
closedir(directory);
}
Buffered I/O
> Applications often read/write one character at a time
▪ getc, putc, ungetc
▪ gets, fgets: read line of text one character at a time, stopping at newline
> Implementing via Linux system calls expensive
▪ > 10,000 clock cycles
> Solution: buffered read
▪ Use read system call to grab block of bytes
▪ User input functions take one byte at a time from buffer, refill buffer when empty
Buffered Portion
not in buffer already read unread unseen
Current File Position
Buffering in Standard I/O
> Standard I/O functions use buffered I/O
printf("h");
printf("e");
printf("l");
printf("l");
printf("o");
buf printf("\n");
h e l l o \n . .
fflush(stdout);
write(1, buf, 6);
> Buffer flushed to output fd on “\n” by calling to fflush
Memory-mapping I/O
> Maps a file into memory, so it can be accessed as a memory array.
▪ Implemented in the mmap function
▪ Only loads the required parts into RAM via on-demand loading (paging)
Buffered vs. Memory-mapping I/O
> Comparison summary
Feature Buffered I/O Memory-mapping I/O
Great for random access,
Performance Good for sequential access
eliminates extra copying
Requires explicit read() and Implicit—accessing memory
System Calls
write() calls triggers page loads
Uses a kernel buffer first, then Directly maps file into
Memory Usage
copy to a user-space buffer. memory
Sequential reads/writes, small Large files, random access,
Best Use Case
files shared memory by processes
Programming More complex, needs careful
Simple and widely used
complexity handling
NEXT LECTURE
[Flipped class] Working with processes
> Pre-class
▪ Study pre-class materials on Canvas
> In class
▪ Reinforcement/enrichment discussion
> Post class
▪ Homework
▪ Preparation for Lab 1
▪ Consultation (if needed)