0% found this document useful (0 votes)
4 views21 pages

APT02 2024S2 New

Uploaded by

minhtrongc31120
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views21 pages

APT02 2024S2 New

Uploaded by

minhtrongc31120
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

ADVANCED

PROGRAMMING
TECHNIQUES

VNU - UNIVERSITY of ENGINEERING & TECHNOLOGY


LECTURE 2: Programming File I/O in Linux

CONTENTS

> Fundamental concepts


> Working with files
> File metadata operations
> Working with directories
> Bufferred I/O
File
Computers store information on various persistent storage media

The OS abstracts from the physical devices to define a file = a logical storage unit

> Linux treats everything as a file, providing a consistent way (tools


+ commands) to interact with various system resources.
▪ Regular file: a sequence of bytes stored on a medium, e.g. texts, images, etc.
▪ Any object or HW device that reads/writes bytes
✓ Terminals
✓ Pipes
✓ Sockets
File Attributes

> Standard file attributes


▪ Name: a file is a named collection of related information
▪ Permissions: read (r)/write (w)/execute (x) rights of a file
▪ Ownership: includes owner (o)/group (g)/other (o)
▪ Time stamps: e.g., atime/mtime/ctime
▪ Size

> Extended file attributes


▪ E.g., immutable, append-only, no atime updates, etc.
▪ Check extended attributes with the shell command: lsattr filename

> Special file attributes (for executables)


▪ E.g., SetUID, SetGID, Sticky Bit.
Linux File System
> Files are organized as a hierarchy anchored by root directory (/)
▪ A directory consists of an array of links, each link maps a filename to a file
▪ Each directory has at least 2 links: to itself (.) & to the parent directory (..)
Pathnames
> Denote the locations of files in the namespace hierarchy
▪ Absolute pathname denotes path from root
▪ Relative pathname denotes path from current working directory (cwd)
▪ Kernel maintains current working directory for each process

/ cwd: /home/bryant

bin/ dev/ etc/ home/ usr/

bash tty1 group passwd droh/ bryant/ include/ bin/

hello.c stdio.h sys/ vim

what is absolute/relative pathnames of hello.c?


unistd.h
File I/O Operations

> C programs usually use stdio package for file I/O.


> Library functions layered on top of I/O system calls
▪ We presume understanding of stdio → focus on system calls.

System calls Library functions


file descriptor (int) file stream (FILE *)
open(), close() fopen(), fclose()
lseek() fseek(), ftell()
read() fgets(), fscanf(), fread(), …
write() fputs(), fprintf(), fwrite(), …
File Descriptors & Open-file Table
> Linux requires that a file must be opened before its first use
▪ Linux kernel handle opened files via their descriptors (small integers) and
keeps a small table containing information about all open files.
▪ Avoid constant search of the directory every time an operation is required.

open-file table
file descriptor
(as table index)
open() foo.txt

user space kernel space secondary storage

> Each Linux shell process begins life with 3 open files
FD Purpose POSIX name stdio stream
0 Standard input STDIN_FILENO stdin
1 Standard output STDOUT_FILENO stdout
2 Standard error STDERR_FILENO stderr
Opening Files

> Informs the kernel that you are getting ready to access that file

#include <sys/stat.h>
#include <fcntl.h>
int fd; /* file descriptor */

if ((fd = open("/etc/hosts", O_RDONLY)) < 0) {


perror("open");
exit(1);
}

> Returns the file descriptor, normally a small nonegative integer


▪ Guaranteed to be lowest available FD
▪ fd == -1 indicates that an error occurred
Closing Files

> Informs the kernel that you are finished accessing that file
#include <sys/stat.h>
#include <fcntl.h>
int fd; /* file descriptor */
int retval; /* return value */

if ((retval = close(fd)) < 0) {


perror("close");
exit(1);
}

> Closing an already closed file is a recipe for disaster in


threaded programs (more on this later)
▪ Always releases FD, even on failure return.

> Always check return codes


Reading Files
> Copies bytes from the current file position to memory, and then
updates file position
#include <unistd.h>
char buf[512];
int fd; /* file descriptor */
int nbytes; /* number of bytes read */

/* Open file fd ... */


/* Then read up to 512 bytes from file fd */
if ((nbytes = read(fd, buf, sizeof(buf))) < 0) {
perror("read");
exit(1);
}

> Returns number of bytes read from file fd into buf


▪ nbytes < 0 indicates that an error occurred
▪ Short counts (nbytes < sizeof(buf) ) are possible and are not errors!
Writing Files
> Copies bytes from memory to the current file position, and then
updates current file position
#include <unistd.h>
char buf[512];
int fd; /* file descriptor */
int nbytes; /* number of bytes read */

/* Open the file fd ... */


/* Then write up to 512 bytes from buf to file fd */
if ((nbytes = write(fd, buf, sizeof(buf)) < 0) {
perror("write");
exit(1);
}

> Returns number of bytes written from buf to file fd


▪ nbytes < 0 indicates that an error occurred
▪ As with reads, short counts are possible and are not errors!
On Short Counts

> Short counts can occur in these situations:


▪ Encountering end-of-file (EOF) on reads
▪ Reading text lines from a terminal
▪ Reading and writing network sockets

> Short counts never occur in these situations:


▪ Reading from disk files (except for EOF)
▪ Writing to disk files

> Best practice is to always allow for short counts.


Seeking in a File

> Moves the current read/write


B0 B1 • • • Bk-1 Bk Bk+1 • • •
position within an open file
▪ E.g., media player seeks position in a
Current file position = k
video file

#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
int fd; /* file descriptor */

/* Open the file fd ... */


/* Then move to the 10th byte from the beginning */
if (offset = lseek(fd, 10, SEEK_SET)) < 0) {
perror(“seek");
exit(1);
}
File Metadata Operations

> Per-file metadata is data about the file data maintained by kernel
▪ accessed by users with the stat/fstat functions

/* Metadata returned by the stat and fstat functions */


struct stat {
dev_t st_dev; /* Device */
ino_t st_ino; /* inode */
mode_t st_mode; /* Protection and file type */
nlink_t st_nlink; /* Number of hard links */
uid_t st_uid; /* User ID of owner */
gid_t st_gid; /* Group ID of owner */
dev_t st_rdev; /* Device type (if inode device) */
off_t st_size; /* Total size, in bytes */
unsigned long st_blksize; /* Blocksize for filesystem I/O */
unsigned long st_blocks; /* Number of blocks allocated */
time_t st_atime; /* Time of last access */
time_t st_mtime; /* Time of last modification */
time_t st_ctime; /* Time of last change */
};
Accessing Directories
> Only recommended operation on a directory: read its entries
▪ dirent structure contains information about a directory entry
▪ DIR structure contains information about the directory while walking through it
#include <sys/types.h>
#include <dirent.h>

{
DIR *directory;
struct dirent *de;
...
if (!(directory = opendir(dir_name)))
error("Failed to open directory");
...
while (0 != (de = readdir(directory))) {
printf("Found file: %s\n", de->d_name);
}
...
closedir(directory);
}
Buffered I/O

> Applications often read/write one character at a time


▪ getc, putc, ungetc
▪ gets, fgets: read line of text one character at a time, stopping at newline

> Implementing via Linux system calls expensive


▪ > 10,000 clock cycles

> Solution: buffered read


▪ Use read system call to grab block of bytes
▪ User input functions take one byte at a time from buffer, refill buffer when empty

Buffered Portion

not in buffer already read unread unseen

Current File Position


Buffering in Standard I/O

> Standard I/O functions use buffered I/O


printf("h");
printf("e");
printf("l");
printf("l");
printf("o");
buf printf("\n");

h e l l o \n . .

fflush(stdout);

write(1, buf, 6);

> Buffer flushed to output fd on “\n” by calling to fflush


Memory-mapping I/O
> Maps a file into memory, so it can be accessed as a memory array.
▪ Implemented in the mmap function
▪ Only loads the required parts into RAM via on-demand loading (paging)
Buffered vs. Memory-mapping I/O
> Comparison summary

Feature Buffered I/O Memory-mapping I/O


Great for random access,
Performance Good for sequential access
eliminates extra copying
Requires explicit read() and Implicit—accessing memory
System Calls
write() calls triggers page loads
Uses a kernel buffer first, then Directly maps file into
Memory Usage
copy to a user-space buffer. memory
Sequential reads/writes, small Large files, random access,
Best Use Case
files shared memory by processes
Programming More complex, needs careful
Simple and widely used
complexity handling
NEXT LECTURE

[Flipped class] Working with processes


> Pre-class
▪ Study pre-class materials on Canvas

> In class
▪ Reinforcement/enrichment discussion

> Post class


▪ Homework
▪ Preparation for Lab 1
▪ Consultation (if needed)

You might also like