Apue Note - File I/O

Notes of APUE

Most file I/O can be performed using only five functions: open, read, write, lseek, and close. Unbuffered IO means that each read or write invokes a system call in the kernel. These unbuffered I/O functions are not part of ISO C, but are part of POSIX.1.

File Descriptors

To the kernel, all open files are referred to by file descriptors - non-negative integer. When we open an existing file or create a new file, the kernel returns a file descriptor to the process.

open/creat Functions

#include <fcntl.h>
int open(const char *path, int oflag, ... /* mode_t mode */ );
int creat(const char *path, mode_t mode);
/* Both return: file descriptor if OK, −1 on error */

Note that creat is equivalent to

open(path, O_WRONLY | O_CREAT | O_TRUNC, mode);

close function

When a process terminates, all of its open files are closed automatically by the kernel.

lseek Function

Because a successful call to lseek returns the new file offset, we can seek zero bytes from the current position to determine the current offset:

off_t    currpos;
currpos = lseek(fd, 0, SEEK_CUR);

This technique can also be used to determine if a file is capable of seeking. If the file descriptor refers to a pipe, FIFO, or socket, lseek sets errno to ESPIPE and returns −1.

#include <stdio.h>
#include <unistd.h>

int
main(void) {
    if (lseek(STDIN_FILENO, 0, SEEK_CUR) == -1) {
        printf("cannot seek\n");
    } else {
        printf("seek ok\n");
    }
}

Test:

$ ./a.out < /etc/passwd
seek OK
$ cat < /etc/passwd | ./a.out
cannot seek

Copy a file - using only read and write

#include <stdio.h>
#include <unistd.h>

#define BUFFSIZE 4096

int
main(void) {
    int n;
    char buf[BUFFSIZE];
    while ((n = read(0, buf, BUFFSIZE)) > 0) {
        write(1, buf, n);
    }
}

Test:

$ cc 03-03-copy.c
$ ./a.out < foo.txt > bar.txt

Changing BUFFSIZE in it can test IO efficiency. From 1 byte to 512k bytes. We found from 1024, the timing are quite the same.

Atomic Operations

The UNIX System provides an atomic way to do this operation if we set the O_APPEND flag when a file is opened. As we described in the previous section, this causes the kernel to position the file to its current end of file before each write. We no longer have to call lseek before each write.

See pread and pwrite Functions.

dup and dup2 Functions

The call dup(fd); is equivalent to fcntl(fd, F_DUPFD, 0);. Similarly, the call dup2(fd, fd2); is equivalent to close(fd2); fcntl(fd, F_DUPFD, fd2);.

dup2 is an atomic operation, whereas the alternate form involves two function calls.

sync, fsync, and fdatasync Functions

Traditional implementations of the UNIX System have a buffer cache or page cache in the kernel through which most disk I/O passes. When we write data to a file, the data is normally copied by the kernel into one of its buffers and queued for writing to disk at some later time. This is called delayed write.

fcntl Function

The fcntl function can change the properties of a file that is already open.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>

int
main(int argc, char *argv[])
{
    int val;
    if (argc != 2)
        printf("usage: a.out <descriptor#>\n");
    val = fcntl(atoi(argv[1]), F_GETFL, 0);
    switch (val & O_ACCMODE) {
        case O_RDONLY:
            printf("read only");
            break;
        case O_WRONLY:
            printf("write only");
            break;
        case O_RDWR:
            printf("read write");
            break;
        default:
            printf("unknown access mode\n");
    }
    if (val & O_APPEND)
        printf(", append");
    if (val & O_NONBLOCK)
        printf(", nonblocking");
    if (val & O_SYNC)
        printf(", synchronous writes");
#if !defined(_POSIX_C_SOURCE) && defined(O_FSYNC) && (O_FSYNC != O_SYNC)
    if (val & O_FSYNC)
        printf(", synchronous writes");
#endif
    putchar('\n');
    exit(0);
}

Test:

$ cc 03-04-fcntl.c
$ ./a.out 0 < /dev/tty
read only
$ ./a.out 1 > temp.foo
$ cat temp.foo
write only

ioctl Function

The ioctl function has always been the catchall for I/O operations. Anything that couldn’t be expressed using one of the other functions in this chapter usually ended up being specified with an ioctl. Terminal I/O was the biggest user of this function.

/dev/fd

Newer systems provide a directory named /dev/fd whose entries are files named 0, 1, 2, and so on. Opening the file /dev/fd/n is equivalent to duplicating descriptor n, assuming that descriptor n is open.