FILE I/O
CONTENTS
• Introduction
• Unix/Linux I/O
• File Descriptor
• Unix/Linux System Calls for File I/O
• The Open File Table
• Open File Object (Handle)
• Default Open Files
• The open(), read(), close() System Calls
• Standard I/O
• The Effect of fork() on Open Files
• The dup() and dup2() System Calls
• The pipe() System Call
2
Introduction
• A Unix/Linux file is a sequence of m bytes: B0, B1, .... , Bk , .... , Bm-1
• All I/O devices are represented as files.
• File Types:
o Regular file: binary or text file.
o Directory file: a file that contains the names and locations of other files.
o Device Files:
Character special files (or character devices or raw devices): provide non-buffered, direct
access to some hardware device such as terminals.
Block special files (or block devices): provide buffered access to hardware devices such as
disks. Unlike character devices, block devices allow the programmer to read or write a
block of any size.
o FIFO (named pipe): a file type used for inter-process communication.
o Socket: a file type used for network communication between processes.
3
Unix/Linux I/O
• The elegant mapping of files to devices allows kernel to export simple
interface called Unix I/O.
• Key idea: All input and output is handled in a consistent and uniform way.
• Basic Unix I/O operations (system calls):
• Opening and closing files
• open()and close()
• Changing the current file position (seek)
• lseek
• Reading and writing a file
• read() and write()
4
File Descriptor
• A file descriptor (a nonnegative integer) refers to an open file.
o 0 refers to standard input,
o 1 refers to standard output
o 2 refers to standard error
• The scope of file descriptors is per process.
• A file descriptor can be viewed as an index into an array of opened files.
5
Unix/Linux System Calls for Files
• creat: int creat(char *pathname, mode_t mode);
o Create a new file and assign and return a file descriptor.
o mode specifies the protection bits to be applied when the new file is created.
• open: int open(char *pathname, int flags, [mode_t mode]);
o Open a file and return a file descriptor.
o flags must include one of the following : O_RDONLY, O_WRONLY, or O_RDWR.
o mode specifies the protection bits to be applied if a new file is created.
• Creat() is equivalent to open() with flags equal to O_CREAT|O_WRONLY|O_TRUNC.
• close: int close(int fd);
• Close a file descriptor fd.
6
Unix/Linux System Calls for Files (continued)
• read: int read(int fd, void *buf, int count);
• Read up to count bytes from fd, into the buffer at buf.
• write: int write(int fd, void *buf, int count);
• Writes up to count bytes into fd, from the buffer at buf
• lseek: int lseek(int fd, int offset, int whence);
• Assigns the file pointer to a new value by applying an offset.
7
open()
• int open(filename, flags, [mode]),
o It opens the file filename using the permissions mode.
o Mode:
O_RDONLY, O_WRONLY, O_RDWR, O_CREAT, O_APPEND, O_TRUNC
O_CREAT: If the file does not exist, the file is created. Use the permissions.
argument for initial permissions. Bits: rwx(user) rwx(group) rwx (others)
Example: 0555 – Read and execute by user, group and others. (101B==5Octal)
O_APPEND: Append at the end of the file.
O_TRUNC: Truncate file to length 0. 8
read( )
• Reads data from an open file
int read(int fd, void *buf, int count)
• Arguments
o File descriptor: integer descriptor returned by open()
o Buffer: pointer to memory to store the bytes it reads
o Count: maximum number of bytes to read
• Returns
o Number of bytes read
0 if nothing more to read
-1 if an error
• Performs a variety of checks
• Whether file has been opened, whether reading is allowed
9
read( ) example:
char buf[512];
int fd; /* file descriptor */
int nbytes; /* number of bytes read */
/* Open file fd ... */
/* Then read up to 512 bytes from file fd */
if ((nbytes = read(fd, buf, sizeof(buf))) < 0) {
perror(“read”);
exit(1);
}
10
write( ) example:
char buf[512];
int fd; /* file descriptor */
int nbytes; /* number of bytes read */
/* Open the file fd ... */
/* Then write up to 512 bytes from buf to file fd */
if ((nbytes = write(fd, buf, sizeof(buf)) < 0) {
perror(“write”); exit(1);
}
• Transfers up to 512 bytes from address buf to file fd.
• Returns number of bytes written from buf to file fd.
• Returned value less than 0 indicates that an error occurred.
11
close( ) example:
void close(int fd)
• Decrements the count of the open file object pointed by fd
• If the reference count of the open file object reaches 0, the
open file object is removed.
int fd; /* file descriptor */
int retval; /* return value */
if ((retval = close(fd)) < 0) {
perror(“close”);
exit(1);
}
12
The Open File Table
• Each process has a File Descriptor Table (maintained by the OS
kernel) with all the files that are opened by that process.
o Process can only affect it using system calls
• Each entry in the File Descriptor Table contains a pointer to an open
file object that contains all the information about the open file.
• The kernel maintains a Open File Table, which includes Open File
Objects for the whole system (shared by all processes).
13
The System View
14
Facts about the Open File Table
• One file (i-node) can have multiple entries in the Open File Table
• One entry in the Open File Table can be pointed by multiple file descriptors
• When dup(), dup2(), and fork() are used, entries in Open File Table are shared.
• When open() is used, another entry in Open File Table is created.
• File descriptors sharing the same Open File Table entry will share the same offset,
which will be updated by read/write.
15
A Process’ Open File Table (File Descriptor Table)
A Process’ File Table Open File Object
0
I-NODE
1
Open Mode
2
Offset
3
Reference Count
4
.
.
31
16
Open File Object (or Open File Handle)
• An Open File Object contains the state of an open file.
o I-Node –
It uniquely identifies a file in the computer. An I-nodes is made of two parts:
Major number – Determines the devices
Minor number – It determines what file it refers to inside the device.
o Open Mode – How the file was opened:
Read Only
Read Write
Append
17
Open File Object (or Open File Handle)
• Offset –
o The next read or write operation will start at this offset in the file.
o Each read/write operation increases the offset by the number of bytes read/written.
• Reference Count –
o It is increased by the number of file descriptors that point to this Open File Object.
o When the reference count reaches 0 the Open File Object is removed.
o The reference count is initially 1.
o It is increased after fork() or calls like dup and dup2.
18
Default Open Files
• When a process is created, there are three files opened by default:
o 0 – Default Standard Input
o 1 – Default Standard Output
o 2 – Default Standard Error
write(1, “Hello”, 5) Sends Hello to stdout
write(2, “Hello”, 5) Sends Hello to stderr
• stdin, stdout, and stderr are inherited from the parent process.
19
Standard I/O Functions
• The C standard library contains a collection of higher-level standard I/O functions.
• Standard I/O library builds on the OS services
o Calls OS-specific system calls for low-level I/O
o Adds features such as formatted I/O and buffered I/O
• Standard I/O models open files as streams: abstraction for a file descriptor and a
buffer in memory.
• Examples of standard I/O functions:
o Opening and closing files (fopen and fclose)
o Reading and writing bytes (fread and fwrite)
o Reading and writing text lines (fgets and fputs)
o Formatted reading and writing (fscanf and fprintf) 20
Buffering in Standard I/O
• Standard I/O functions use buffered I/O
printf(“h”);
printf(“e”);
printf(“l”);
printf(“l”);
printf(“o”);
buf printf(“\n”);
h e l l o \n . .
fflush(stdout);
write(1, buf += 6, 6);
21
Unix I/O vs. Standard I/O
Standard I/O implemented using low-level Unix I/O.
fopen fdopen
fread fwrite
fscanf fprintf
sscanf sprintf C application program
fgets fputs
fflush fseek
fclose Standard I/O functions
open read
Unix I/O functions
write lseek
(accessed via system calls)
stat close
22
The Stream Abstraction
• Any source of input or destination for output
o e.g., keyboard as input, and screen as output
o e.g., files on disk, network ports, printer port, …
• C programs through file pointers
o e.g., FILE *fp1, *fp2;
o e.g., fp1 = fopen(“myfile.txt”, “r”);
• Three streams provided by stdio.h
o Streams stdin, stdout, and stderr
Typically map to keyboard, screen, and screen
o Can redirect to correspond to other streams
e.g., stdin can be the output of another program
e.g., stdout can be the input to another program 23
Example: Opening a File
• FILE *fopen(“myfile.txt”, “r”)
o Open the named file and return a stream
o Includes a mode, such as “r” for read or “w” for write
• Creates a FILE data structure for the file
o File descriptor, mode, status, buffer, …
o Assigns fields and returns a pointer
• Opens or creates the file, based on the mode
o Write (‘w’): create file with default permissions
o Read (‘r’): open the file as read-only
o Append (‘a’): open or create file, and seek to the end 24
Example: Formatted I/O
• int fprintf(fp1, “Number: %d\n”, i)
o Convert and write output to stream in specified format
• int fscanf(fp1, “FooBar: %d”, &i)
o Read from stream in format and assign converted values
• Specialized versions
o printf(…) is just fprintf(stdout, …)
o scanf(…) is just fscanf(stdin, …)
25
Details of FILE in stdio.h
#define OPEN_MAX 20 /* max files open at once */
typedef struct _iobuf {
int cnt; /* number of characters left in buffer */
char *ptr; /* ptr to next char in buffer */
char *base; /* beginning of buffer */
int flag; /* open mode flags, etc. */
char fd; /* file descriptor */
} FILE;
extern FILE _iob[OPEN_MAX];
#define stdin (&_iob[0])
#define stdout (&_iob[1])
#define stderr (&_iob[2]) 26
The Effect of fork() on Opened Files
• The File Descriptor table of the parent is copied in the child.
• The Open File Objects of the parent are shared with the child.
• The reference counters of the Open File Objects are increased.
• By sharing the same open file objects, parent and child or
multiple children can communicate with each other.
• They share the same offset, that is, after one reads it, the offset
will be updated
• Allows commands in a pipe line to communicate.
27
The dup() system call
fd2 = dup(fd1)
• dup(fd1) returns a new file descriptor that points to the
same file object that fd1 is pointing to.
• The reference counter of the open file object that fd1 refers
to is increased.
• This will be useful to “save” the stdin, stdout, stderr, so the
shell process can restore it after doing the redirection.
28
The dup2() System Call
int dup2(fd1, fd2)
• The dup2() system call performs the same task as dup(), but instead
of using the lowest-numbered unused file descriptor, it uses the file
descriptor number specified in fd2.
• After dup2(fd1,fd2), fd2 refers to the same open file object as fd1.
• The open file object that fd2 referred to before is closed.
• The reference counter of the open file object that fd1 refers to is
increased.
• dup2() will be useful to redirect stdin, stdout, and stderr.
29
Example: Redirecting stdout
A program that redirects stdout to a file myoutput.txt:
int main(int argc, char* argv[])
{
// Create a new file
int fd = open(“myoutput.txt”, O_CREAT|O_WRONLY|O_TRUNC, 0664);
if (fd < 0) {
perror(“open”); exit(1);
}
// Redirect stdout to file
dup2(fd, 1);
// Now printf that prints to stdout, will write to myoutput.txt
printf(“Hello world\n”);
} 30
The pipe() System Call
int pipe(pp[2])
• pp[2] is an array of int with two elements.
• After calling pipe, pp[] will contain two file descriptors
that point to two open file objects that are interconnected.
• What is written into pp[1] can be read from pp[0].
• In Linux pipes are unidirectional.
• Pipes are implemented using file descriptors and share many
behaviors with files.They exist only in memory and do not
have a presence in the file system, unlike regular files.
31
Example of pipes and redirection
• Write a program “lsgrep arg1 arg2” that runs
“ls –al | grep arg1 > arg2”
• Example: “lsgrep aa myout” lists all files that contain
“aa” and puts output in file myout.
32
Example of pipe and redirection
int main(int argc,char* argv[]) {
//save stdin/stdout
int tempin = dup(0);
int tempout = dup(1);
//create pipe
int pp[2];
pipe(pp);
//redirect stdout
dup2(pp[1],1);
close(pp[1]);
33
int pid = fork();
if(pid == 0) {
// close file descriptors as soon as are not needed
close(pp[0]);
char* args[3];
args[0] = "ls";
args[1] = “-al";
args[2] = NULL;
execvp(args[0], args);
} 34
//redirect stdin
dup2(pp[0], 0);
close(pp[0]);
//create outfile
int fd = open(argv[2], O_WRONLY|O_CREAT|O_TRUNC, 0600);
if (fd < 0){
perror("open"); exit(1);
}
//redirect stdout
dup2(fd,1);
close(fd);
35
// fork for “grep”
pid = fork();
if(pid == 0) {
char * args[3];
args[0] = “grep";
args[1] = argv[1];
args[2] = NULL;
execvp(args[0], args);
}
// Restore stdin/stdout
dup2(tempin,0);
dup2(tempout,1);
// Parent waits for grep process
waitpid(pid,NULL);
printf(“All done!!\n”);
} // main
36
Closing Unused Pipes
• Must close unused write end of the pipe, otherwise processes keep waiting for
more to be written to the pipe, and will not exit.
o Only when all write ends are closed to the pipe, is EOF sent to the readers.
• Good practice to close unused read end of the pipe as well, so writer process gets
an error signal when writing to the pipe.
o Otherwise a writer process may keep writing, until pipe is full and process is
blocked.
37