Linux File and device Programming

Source: Internet
Author: User
Linux File and device programming-general Linux technology-Linux programming and kernel information. The following is a detailed description. File Access primitive

One of the most important abstract concepts of posix apis is files. Although almost all operating systems use files for permanent storage, all Unix versions provide access to most system resources through the file abstraction concept.
More specifically, this means that linux uses the same set of system calls to provide devices (such as floppy disks and tape devices) and network resources (the most common is TCP/IP connections), system terminal, or even Kernel Status information access. Thanks to the ubiquitous system calls, it is important for every Linux programmer to skillfully use file-related calls. Let's take a closer look at some basic concepts behind the file API and describe the most important file-related system calls.

Linux provides many different types of files. The most common type is conventional files, which store a large amount of information for future access. Most of the files you are using-such as executable files (such as/bin/vi), data files (such as/etc/passwd), and system binary files (such as/lib/libc. so.6) -- common files. They usually reside somewhere on the disk, but we will find later that this is not necessarily the case.

Another file type is Directory, which contains a list of other files and their locations. When you use the ls command to list files in a directory, it opens the files in the directory and prints information about all the files it contains.

Other file types include Block devices (devices indicating high-speed file system caching, such as hard drives) and character devices (devices indicating non-high-speed cache, such as tape drives, mouse, and system terminals), pipelines, sockets (allow processes to communicate with each other), and symbolic links (allow files to have multiple names in the directory hierarchy ).

Most files have one or more symbolic names that reference them. These symbolic names are a group of strings that are bounded by/characters and are identified to the kernel file. They are the path names that linux users are familiar with. For example, the path name/home/ewt/article references a file in my laptop that contains the text of this article. No two files can share the same name (but a single file can have multiple names), so the path name uniquely identifies a single file.

Each file that a process can access is identified by a small non-negative integer, called a "file descriptor ". The file descriptor is created by the system that opens the file and inherited by the new sub-process created from the current process. That is to say, when a process starts a new program, the open file of the original process is generally inherited by the new program.

As agreed, most programs retain the first three file descriptors (0, 1, and 2) for special purposes-access the so-called standard output, standard output, and standard error stream. File descriptor 0 is a standard input, where many programs will receive input from the external world. File descriptor 1 is a standard output. Most programs display normal output here. For output related to the error, Use file descriptor 2 (standard error ).

Anyone who is familiar with linux shell has seen the use of standard input, output, and error file descriptors. Generally, shell commands contain file descriptors 0, 1, and 2, which are shell terminals. When the> character indicates that shell sends the output of a program to another program, shell opens the file as file descriptor 1 before calling the new program. This will cause the program to send its output to the specified file instead of the user terminal; the terminal is transparent to the program itself!

Similarly, the "<" character indicates that shell uses a specific file as the file descriptor 0. In this way, the program is forced to read its input from the file; in both cases, any errors from the program will still appear on the terminal, as if they were issued to the standard error in case of file descriptor 2. (In the "bash" shell, you can use 2> instead of> to redirect standard errors ). This type of file redirection is one of the most powerful features of linux Command lines.

Before using any file-related system call, the program should include And They provide function prototypes and constants for the most common file routines. In the following sample code, we assume that each program has

# Include
# Include

First, let's learn how to read and write files. Intuitively, we can know that the read () and write () system calls are the most common methods to execute these operations. These two system calls have three independent variables: The file descriptor to be accessed, the pointer to the information to be read and written, and the number of characters to be read and written. Returns the number of successfully read/write characters. Listing 1 illustrates a simple program that reads a row from the standard input (file descriptor 0) and writes it to the standard output (file descriptor 1 ):

Listing 1:

Void main (void ){
Char buf [100];
Int num;

Num = read (0, buf, sizeof (buf ));
Write (1, "I got:", 7);/* Length of "I got:" is 7! */
Write (1, buf, num );
}

There are two notable issues about this processing. First, we require read () to return 100 characters, but if we run this program, the input can only be obtained after the user presses the "enter" key. Many file operations work according to the best results: they try to return all the information required by the program, but only some of them are successful. By default, when the terminal is configured to exist "or a new line character (generated by pressing" enter "), it will be returned from the read () call. This is actually very convenient, because most users want the program to be line-oriented in any way. However, this is not the case for conventional data files. Relying on it may produce unpredictable results.

Another issue that needs to be noted is that we do not have to write one after the display output. Read () is called by the user, and only writes the data through write () back to the standard output. If you want to see the event without a newline, try to change the last line

Write (1, buf, num-1 );

The last point of this simple example: buf definitely does not contain the actual C string. The C string is terminated by a single character ending with the flag string. Because read () is not added to the end of the buffer, using strlen () (or any other C string function) on read () may be a big mistake! This behavior allows read () and write () to process data including characters, which is not possible for general string functions.

Read () and write () system calls can work on most files. But they do not work for directories. directories should be accessed through special functions (such as readdir. In addition, read () and write () do not work for some types of sockets.

Some files, such as regular files and block device files, use the File pointer concept. It specifies where to read the next read () call in the file and where to write the next write () call. After read () or write (), the file pointer increases with the number of processed characters (internally, through the kernel. In this way, you can easily read all the data in the file using a single loop. Listing 2 is an example:

Listing 2:

Char buffer [1024];
While (num = read (0, buffer, 1024 ))){
Printf ("got some data ");
}



This loop will read all the data in the standard input and automatically add the internal file pointer of the kernel after each read. When the file pointer is at the end of the file, read () returns 0 and exits the loop. Some files (such as character devices-terminals are a good example) do not have file pointers, so this program will continue to run, until the end mark of the file is provided (by pressing "Ctrl-D.

So far, we have learned how to read and write files. The next step is to learn how to open a new file. There are different methods to open different types of files. The method we will discuss here is to open the files represented in the file system through the path name; includes regular files, directories, device files, and specified pipelines. Some socket files have path names, which must be opened in an alternative way.

Open () system calls allow programs to access most system files. Open () is an unusual system call because it gets two or three independent variables:

Int open (const char *
Pathname,
Int flags );

Or,
Int open (const char *
Pathname,
Int flags,
Int perm );

The first form is more common; it opens an existing file. The second format should be used when you need to create a file. The third independent variable specifies the permission to access the new file.

The first parameter of open () is the full path name (that is, terminate) represented by a normal C string ). The second parameter specifies how the file should be opened and contains one or more of the following marks for the logical "and" Operation:

O_RDONLY: the file can be read-only.
O_RDWR: files can be read and written.
O_APPEND: the file can be read or attached.
O_CREAT: if the file does not exist, create
O_EXCL: if the file already exists, it fails instead of creating it (only O_CREAT should be used)
O_TRUNC: if the file already exists, remove all data from it (similar to creating a new file)

The third parameter of open () is only required when O_CREAT is used. It specifies the File Permission represented in numbers (the format is the same as the format of the chown command's numerical permission independent variable. The permission specified for open () is affected by the user's umask, which allows the user to specify the default permission that should be obtained for a series of new files. Most programs that create files use the third independent variable 0666 to call open (). You can use umask to control the default permission of the program. (Most shell umask commands can change it .)

For example, listing 3 shows how to open a file for reading and writing, create a file if it does not exist, and discard the data:

Listing 3:

Int fd;
Fd = open ("myfile", O_RDWR | O_CREAT | O_TRUNC, 0666)
If (fd <0 ){
/* Some error occurred */
/*...*/
}

Open () returns the file descriptor of the referenced file. Recall that the file descriptor is always greater than or equal to 0. If open () returns a negative value, it indicates an error has occurred. The global variable error number contains the Unix error code that describes the problem. Open () always tries to return the minimum number. If the file descriptor 0 is not used, open () always returns 0.

When a process ends with a file, it should be closed through the close () system call. The format of the system call is:

Int close (int fd );

The file descriptor of close is the unique independent variable passed to close (), and 0 is returned if the result is successful. Although close () fails, if the file descriptor references a file on a remote server, the system cannot clear its cache correctly, and close () may actually fail. When the process is terminated, the kernel automatically closes all files that are still open.

The last common file operation is to move the file pointer. This (naturally) only makes sense to files with file pointers. If you try this operation on an inappropriate file, an error will be returned. Lseek () system call is used for the following purposes:

Off_t lseek (int fd, off_t pos, int whence );

Off_t is a unique way to express longint (long is the origin of "l" in lseek. Lseek () returns the final position of the file pointer relative to the start of the file. If an error exists,-1 is returned. This system calls the file descriptor of the file pointer to be moved as the first independent variable, and moves it to the position in the file as the second independent variable. The last independent variable describes the movement of file pointers.

SEEK_SET moves it to the pos byte starting from the file.
SEEK_END moves it to the pos byte counted from the end of the file.
SEEK_CUR moves the pos byte from its current position to the end of the file.

The combination of open (), close (), write (), read (), and lseek () provides basic file access APIs for linux. Although there are many other file-manipulation functions, this is the most common description.

Most programmers use familiar ansi c library file functions, such as fopen () and fread (), instead of the low-level system calls described here. It is foreseeable that fopen () and fread () are implemented based on these system calls in user-level libraries. It is still common to see the use of low-level system calls, especially in more complex programs. By familiarizing yourself with these routines and interfaces, you can become a real Unix hacker.

About the author
Erik Troan is a developer of Red Hat Software and one of the authors of linux Application Development. You can contact him through the ewt@redhat.com.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.