[Csapp notes] [Tenth. System-Level I/O]

Source: Internet
Author: User
Tags rewind signal handler types of functions truncated

Tenth chapter System level I/O

输入/输出(I/O): Refers 主存 to the 外部设备 process of copying data between and (such as disks, terminals, networks).

    • High-level I/O functions

      • scanfAndprintf
      • <<And>>
      • Implemented using system-level I/O functions
    • System-level I/O functions.

      • Q: Most I/O of the time the high-level functions are working well, why do we have to learnUnix I/O
      • A:
        • Understanding Unix I/O will help you understand other system concepts.
          • To understand the other concepts in depth, it must be understood I/O .
        • Sometimes you have no choice but to use it. Unix I/O
          • I/O库The standard does not provide a 元数据 way to read files.
            • such as 文件大小 or 文件创建时间 .
          • Used to be 网络编程 very risky.
10.1 Unix I/O
  • One Unix 文件 is m the sequence of a byte:

    • All I/O devices are modeled 文件 .
    • All inputs and outputs are treated as the corresponding files 读和写 .
  • 设备Gracefully mapped 文件 to allow the Unix kernel to elicit a simple , low-level 应用接口 . CalledUnix I/O

    • Allows all input and output to be executed in a uniform and consistent manner.

      • Open File : application requires kernel to open file

        • kernel Returns a small non-negative integer , called the descriptor

          • equals kernel to assign a file name to indicate the current file. The
          • kernel Records All information about this open file. The application only needs to remember the identifiers.
        • Unix Shell creates a process with three open files

          • Label Quasi-input (identifier 0 )
          • standard output (designator 1 )
          • standard error (Designator 2 )
          • header file <unistd.h> defines descriptor values for constants instead of explicit
            • stdin_fileno
            • Stdout_fileno
            • stderr_fileno
      • Change the location of the current file (non-file directory)

        • For each open file, the kernel maintains a文件位置k

          • Initial to 0 .
          • 文件位置That is, starting at the beginning of the file 字节偏移量 .
        • Performs lseek an action that is explicitly set 文件位置 .

      • read and write files .

        • One 读操作 is to copy bytes from the file n to the memory and then k add it to the k+n .

          • Given a file of a size m of bytes, k>=m a condition called is triggered when a read operation is performed end-of-file(EOF) .
            • The application can detect this 条件 (or signal?)
            • There is no such symbol at the end of the file.
        • 写操作is to copy bytes from the memory n to a file, 文件位置 starting with the current k, and then updating k .

      • Close File: When the application finishes accessing the file, the notification 内核 closes the file.

        • Response

          • 内核Frees the data structure created when the file is opened.
          • Will 描述符 revert to the available descriptor pool.
        • Regardless of the reason a process is closed, the kernel closes all of its open files.

10.2 Opening and closing files

进程is open to open an existing file or create a new file by calling a function.

#include<sys/types.h>#include<sys/stat.h>#include<fcntl.h>int open(char *filename,int flags,mode_t mode);                        //返回:若成功则为新文件描述符,若出错为-1

openThe function is filename converted to one 文件描述符 , and a descriptor number is returned.

  • The returned 描述符 always is not currently open in the process 最小描述符 .

  • The
  • flags parameter indicates how the process intends to access the file:

    • but with one or more masks . (Take binary thinking Thinking)
    • o_rdonly : Read-only
    • o_wronly : Write only
      • o_creat : If the file does not exist, create a truncated (truncated) (empty) file.
      • o_trunc : truncate it (length truncated to 0 , property unchanged)
      • O_append: Before each write operation, set the file location to the end of the file
    • o_rdwr : Readable writable

        example code//read-only mode open a file FD = open ("F Oo.txt ", o_rdonly,0);//Open an existing file and add a data to the back side FD = open (" Foo.txt ", o_wronly|
       o_append,0);  
  • modeparameter specifies 新文件 the access permission bit.

    • Each process has umask

      • 权限掩码Or权限屏蔽字
      • All permissions that are set are subtracted from this 权限掩码 is the actual permission.
        • 777-022=755Or is 777&~022 .
      • Through umask() function settings
    • modeNot actual permissions

      • The permission bit of the file is set to mode & ~umask , or it can be subtracted from.
    • Example

      #define DEF_MODE S_IRUSR|S_IWUSER|S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH//所有人都能读和写#define DEF_UMASK S_IWGRP|S_IWOTH //屏蔽了用户组的写和其他人的写umask(DEF_UMASK);fd=oepn("foo.txt",O_CREAT|O_TRUNC|O_WRONLY,DEF_MODE);//创建了一个新文件,文件的拥有者有读写权利,其他人只有读权限。(屏蔽了用户组的写和其他人的写)

closefunction to close an open file

#include <unistd.h>int close(int fd);                //返回: 若成功则为0,若出错则为-1

Closing a closed descriptor will cause an error.

10.3 Reading and writing files

Invoking read and write completing input and output

#include <unistd.h>ssize_t read(int fd,void *buf,size_t n);//read函数从描述符fd的当前文件位置拷贝最多n个字节到存储器buf                    返回:若成功则为读的字节数,若EOF则为0,若出错为 -1.ssize_t write(int fd,const void *buf,size_t n)//write函数从存储器位置buf拷贝至多n个字节到描述符fd的当前文件位置                    返回:若成功则为写的字节数,若出错则为-1

Shows a program that uses read and write invokes one byte at a time from 标准输入 copy to 标准输出 .

By invoking lseek a function, the application is able to display the location of the current file

What's the difference between ssize_t and size_t?

  • SIZE_T: is defined asunsigned int
  • ssize_t: is defined as int
    • In order to make an error, return-1.
    • Interestingly, because of this-1, the maximum value of read was reduced by half.

In some cases, read and the write transmitted bytes are less than the application requires, there are the following reasons.
The value returned by such a condition is called 不足值 .

    • Encountered EOF while reading.
    • Read line of text from terminal ( stdin and STDIN_FILENO )

      • 不足值equals the size of the line of text.
    • Read and write network sockets ( socket )

      • An internal buffer constraint and a longer network delay can cause read and write return insufficient values.
      • To create a robust Web服务器 network application like this, you must repeatedly invoke read and process the write insufficient values, knowing that all the required bytes are delivered.

General disk files In addition EOF to, generally do not encounter the problem of insufficient values.

10.4 read in a robust way with the Rio package

RIOPackage: Full-name Robust I/O package, robust I/O package. is automatically processed as described above 不足值 .

provides two different types of functions:

    • unbuffered input and OUTPUT functions

      • Data is transferred directly between the memory and the file, with no application-level buffering .
      • They are especially useful for binary reading and writing to the network and for reading binary data from the network .
    • Input function with buffering

      • Allows you to efficiently read text lines and binary data from a file.
      • The contents of these files are 缓存 within the application-level buffer.
    • The buffered RIO input function is 线程安全 the same descriptor can be interleaved call

unbuffered input and output functions for 10.4.1 Rio

    • With the ordinary read , the write difference
      • 网络套接字does not produce 不足值 when read and write
        • It is rio_writen impossible to return不足值
      • Thread-safe.
      • wirte read A manual restart is allowed when the return of the application signal handler is interrupted.

Source

Buffered input function for 10.4.2 Rio

A 文本行 sequence of code characters that ends with a newline character ASCII .

    • In Unix systems, the newline character ( \n ) is equal to the ASCII line break LF , and the numeric value is0x0a
RIO_READNB and Rio_readlineb introduced

Suppose we're going to write a program to calculate the number of Chinese lines in a text file ?

    • Scenario 1: The read function transfers from file to user memory one byte at a time, checking each byte to find a newline character.

      • Inefficient, one character per read file is required to fall into the kernel.
    • A better approach is to call a wrapper function rio_readlineb .

      • It 读缓冲区 copies a line of text from an internal.

        • When the buffer becomes empty, it is called to read refill the buffer.
      • Why is this faster?

        • Using the principle of spatial locality
    • rio_readnThe version of the band buffer used rio_readnb .
      • For files that contain text lines that also contain binary data (for example, the response that is mentioned in section 11.5.3 HTTP ).
      • and rio_readlineb the same read buffer in which the raw bytes are transmitted.
RIO_READINITB and Rio_readnb,rio_readlineb instances

A 描述符 function is called once for each opening rio_readinitb .

    • It links the descriptor fd to rp a rio_t read buffer of the type at the address.

    • rio_readlineb(&rio,buf,MAXLINE)Function

      • rio_readlinebThe function rio reads a line of text (including the trailing newline character) from (the buffer), copies it to the memory location buf , and \0 ends the line of text with a character.
      • rio_readlinebThe most read MAXLINE-1 characters, the rest is truncated, the end is always \0 .
    • rio_readnb(&rio,buf,n)

      • rio_readnbThe function rio reads from n a maximum of bytes tobuf
    • For the same descriptor, the rio_readlineb call to and rio_readnb can be arbitrarily crossed.

      • But the buffered and unbuffered should not cross-reference.

The remainder gives an example of a large number RIO of functions.

    • Figure 10-5 shows a format and the 读缓冲区 code that initializes it rio_readinitb .
      • rio_readinitbThe function creates an empty read buffer and associates an open file descriptor with it.

    • The function shown in Figure 10-6 rio_read is the core of the Rio read program.

      • rio_readis a Unix read buffered version of the function.
        • When the call rio_read requires reading n a byte.
        • At this point, if the buffer is empty, the call read fills up, but it may not be full.
        • Reads the 缓冲区 internal unread min(n,rp->rio_cnt) bytes.
    • For an application, rio_read it Unix read has the same semantics as the function.

      • It is possible that sometimes the return may 不足值 be different.

        • So if you throw away the insufficient value, the two are the same.
        • That is to wrap it and make him read full.
        • That is, the and of the later text rio_readn rio_readnb .
      • The similarity between the two makes it possible to replace each other in some cases.

        • As in the following article rio_readn and rio_readnb .

10.5 Read File meta data

Applications can stat fstate retrieve information about a file (sometimes referred to as a file) through calls and functions 元数据(metadata)

#include<unistd.h>#include<sys/stat.h>int stat(const char *filename,struct stat *buf);int fstat(int fd,struct stat *buf);//填写stat数据结构中的各个成员                    返回 : 成功0 ,出错-1

    • st_sizeThe member contains the file 字节数大小 .
    • st_modeThe member encodes the file 访问许可位 and 文件类型
      • File type
        • Common Type : what we generally say文件
        • catalog File : Contains information about other files
        • sockets : A file that is used to communicate with other processes over the network.
      • UNIX provides a 宏指令 basis for st_mode determining the file type, which is part of the following.
        • S_ISREG()#这是一个普通文件吗
        • S_ISDIR()#这是一个目录文件吗
        • S_ISSOCK()#这是一个网络套接字吗
        • In the sys/stat.h definition

Figure 10-10 shows how to use and stat function to read and interpret

10.6 Sharing files

Unless you know 内核 how to represent an open file, 文件共享 The concept is quite difficult to understand.

内核There are three related data structures to represent open files:

    • Descriptor Table (descriptor) :

      • Each process has a -independent descriptor for .
      • its table entry is indexed by the file descriptor opened by the process.
      • each open Descriptor table entry points to one of the list items in the file table .
    • File Table : The collection of open files is represented by a file table .

      • All Processes share this table.
      • the portion of each file Table entry is
        • the current file location
        • Reference count (reference count) : That is, the current The number of descriptor items that point to the table entry.
          • Close a descriptor reduces reference count in the corresponding file table entry .
          • when reference count changes to 0 . The kernel deletes this file table entry.
        • and a pointer to the corresponding table entry in the v-node table.
    • v-node

      • All Processes share this table.
      • Each table entry contains most of the information for the stat structure.
        • st_mode
        • st_size

There are three possible scenarios for opening a file:

The most common types

    • is to open two different files, and the file disk location is not the same.
    • is not shared .

Sharing Scenario 1

    • Multiple 描述符 can also 文件表表项 refer to the same by referencing a different one 文件 .
    • Same content, 文件位置 different (point to disk location is the same block)
    • Example
      • This can happen if you filename call two times with the same one open .
      • Each 描述符 has its own file location, so different 描述符 read operations can fetch data from different locations in the file.

Child Parent Process sharing situation

We can also understand how a parent-child process can share files.

    • forkafter the call, the child process has a copy of the parent process 描述符表 .
    • The parent-child process shares the same open 文件表 .

      • Share the same 文件位置 .
    • A very important result .

      • Before the kernel deletes the corresponding file table entries, the parent -child process must close their descriptors .
      • Don't think the parent process close(fd 1) is good.
        • Sub-processes also need toclose(fd 1)
10.7 I/O redirection

UnixThe shell provides I/O redirection capabilities that allow users to link disk files to standard input and output.

    • For example

      unix> ls > foo.txt
      • Causes the shell program to load and execute ls , redirecting standard output to a disk file foo.txt .
    • A Web program CGI also performs a similar type of redirection when the program is allowed on behalf of the client.

I/OHow does redirection work?

    • Using dup2 functions

      #include<unistd.h>int dup2(int oldfd,int newfd);            返回:若成功则为非负的描述符,若出错则为-1
      • dup2The function copies descriptor table entries oldfd to descriptor table entries newfd , overwriting newfd .
        • If newfd it is already open, dup2 it will be oldfd closed before copying newfd . (nonsense, not sure to open it?)

Left and right.hoinkies

  • 右hoinkies:>
  • 左hoinkies:<
10.8 Standard I/O

ANSI CDefines a set of advanced input and output functions called 标准I/O libraries.

  • This 库(libc) provides a

    • Functions for opening and closing files ( fopen and fclose )
    • Read and Write sections ( fread and fwrite )
    • Functions for reading and writing strings ( fgets and fputs )
    • As well as complex formatting I/O functions ( scanf and printf )
  • The standard I/O library models an open file into a

    • For programmers, one is a pointer to a FILE struct of type.
    • Each ANSI C program starts with three open

      • stdinStandard input
      • stdoutStandard output
      • stdoutStandard error

        #include<stdio.h>extern FILE *stdin;extern FILE *stdout;extern FILE *stderr;
  • A stream of type FILE is a pair 文件描述符 and 流缓冲区 an abstraction.

    • 流缓冲区The purpose and RIO读缓冲区 purpose of the same
      • is to make the Unix I/O number of system calls with higher overhead as small as possible.
10.9 Synthesis: What I/O functions should I use

The figure summarizes the various packages we have discussed I/O .

    • Unix I/O
    • RIO I/O
    • 标准I/O
      • 磁盘and terminal equipment selection.
      • In 网络输入输出 use, there are some problems.
        • UnixThe abstraction of a network is a 套接字 type of file called.
          • As with any Unix file, it 套接字 is also used 文件描述符 to refer to, called 套接字描述符 .
          • The application process 套接字描述符 communicates with a process running on another computer by reading and writing.
      • Most c programmers, in their careers, use only标准I/O

标准I/O流In a sense 全双工 , because the program can execute input and output on the same one .

However, the limitations and limitations of the pair are 套接字 sometimes conflicting, and very few documents describe these phenomena:

(not understand)

    • Limit one: The input function followed by the output function.
      • If there are no intervening pairs, fflush fseek fsetpos or rewind calls, an input function cannot be followed by the output function.
        • fflushThe function empties the buffer associated with the stream.
        • The following three functions use Unix I/O the lseek function to reset the current file location.
    • Limit two: The output function followed by the input function.
      • If there is no intervening pair fseek , fsetpos or rewind call, an output function cannot follow an input function unless the input function encounters an ' EOF '.

Do not understand, after reading the look again.

Therefore, we recommend that you 网络套接字 do not use it for 标准I/O input and output. and to useRIO

    • If you need to format the output

      • Use sprintf a function to format a string in memory.
      • Then use rio_writen it to send it to the socket interface.
    • If you need to format the input

      • Use rio_readlinb to read a complete line of text
      • Then use sscanf a different field to extract from the text line.
10.10 Summary

[Csapp notes] [Tenth. System-Level I/O]

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.