Python Core Programming (second Edition)--file and input/output

Source: Internet
Author: User
Tags glob posix readline

1. File objects

The file object can be used not only to access normal disk files, but also to access "files" on any other type of abstraction level. Once the appropriate "hooks" are set, you can access other objects that have the file type interface as if you were accessing a normal file. The file is just a sequential sequence of bytes. Data transfers often use byte streams, regardless of whether the byte stream consists of a single or large chunk of data.

2. The file built-in function "open () and" () "

The built-in function open () [and file ()] provides a common interface for initializing input/output (I/O) operations as a "key" to open the door of a file.

The open () built-in function returns a file object when the file is successfully opened, or an error is raised. When the operation fails, Python produces a IOError exception. The basic syntax for the built-in function open () is:

File_object = open (file_name, access_mode= ' R ', Buffering=-1)

File_name is a string containing the name of the file to be opened, which can be a relative path or an absolute path. Optional variable Access_mode is also a string that represents the mode in which the file opens. Typically, files are opened using the pattern ' R ', ' W ', or ' a ' mode, respectively, representing read, write, and append. There is also a ' U ' mode, which represents universal line break support. Files opened using the ' R ' or ' U ' mode must already exist. Files opened with ' W ' mode are emptied first if they exist, and then (re) created. Files opened in ' A ' mode are prepared for append data, and all written data is appended to the end of the file. Even if you seek somewhere else. If the file does not exist, it will be created automatically, similar to opening the file in ' W ' mode.

If there is no given access_mode, it will automatically take the default value of ' R '. Another optional parameter, buffering, is used to indicate the buffering method used to access the file. where 0 means no buffering, 1 means that only one row of data is buffered, and any other value greater than 1 represents a buffer size using the given value. This parameter is not provided, or a given negative value represents the use of the system default buffering mechanism, which uses a row buffer for any class Telegraph (TTY) device, and other devices use normal buffering. In general, you can use the system default mode.

Access mode for file objects
File mode Operation
R Open in read mode
RU or Ua Read-Open with Universal line feed support (PEP 278)
W Open as write (empty if necessary)
A Open in Append mode (starting with EOF and creating a new file if necessary)
r+ Open in read-write mode
w+ Open in read-write mode (see W)
A + Open in read-write mode (see a)
Rb Open in binary read mode
Wb Open in binary write mode (see W)
Ab Open in binary append mode (see a)
rb+ Open in binary read/write mode (see r+)
wb+ Open in binary read/write mode (see w+)
ab+ Open in binary read/write mode (see A +)
A New in Python 2.3

The open () and file () functions have the same functionality and can be arbitrarily replaced. Generally speaking, we recommend using open () to read and write files.

Universal line break Support (UNS): When you open a file with the ' U ' flag, all line breaks (or line terminators, whatever it is) are replaced with a newline character newline (\ n) when returned via Python's Input method (for example, read* ()). (' RU ' mode also supports ' RB ' option). This feature also supports files that contain different types of line terminators. The Newlines property of the file object records the line terminator of the file it had "seen".

Note UNS is only used to read text files, there is no corresponding method to process the output of the file.

When compiling Python, UNS is turned on by default. If you don't need this feature, you can use the--without-universal-newlines switch to turn it off when you run the Configure script.

3. Document Building methods

After open () executes successfully and returns a file object, all subsequent operations on the file will be made through this handle. File methods can be divided into four categories: input, output, move within files, and miscellaneous operations.

1) input

The read () method is used to read bytes directly into a string, up to a given number of bytes. If no size parameter is given (the default is-1) or the size value is negative, the file is read until the end.

The ReadLine () method reads a line of open files (all bytes before the next line terminator is read). Then the entire row, including the line terminator, is returned as a string. It also has an optional size parameter, which defaults to-1, which represents reading to the line terminator. If this argument is provided, incomplete rows are returned after the size of a word.

The ReadLines () method does not return a string like the other two input methods. It reads all (the remaining) rows and returns them as a list of strings. Its optional parameter, Sizhint, represents the maximum size of bytes returned. If it is greater than 0, then all rows returned should have approximately sizhint bytes (perhaps slightly larger than this number because the buffer size needs to be pooled).

2) output

The Write () built-in method writes a string containing text data or binary data blocks to a file.

The Writelines () method is a list-based operation that takes a list of strings as parameters and writes them to a file. The line terminator is not automatically added, so if necessary, you must add a line terminator to the end of each line before calling Writelines ().

Core Note: preserves line separators. When reading rows from a file using an input method such as read () or ReadLines (), Python does not delete the line terminator. This operation was left to the programmer. The output method write () or writelines () does not automatically join the line terminator. You should do it yourself before writing the data to the file.

3) Move within file

The Seek () method can move a file pointer to a different location in a file. The offset byte represents an offset relative to a position. The default value for the location is 0, which represents the beginning of the file (that is, the absolute offset), 1 representing the current position, and 2 representing the end of the file.

The text () method is a supplement to seek (); It tells you where the current file pointer is in the file-from the beginning of the file, in bytes.

4) file Iteration

With the introduction of iterators and file iterations in Python 2.2, the file objects become their own iterators, and the next method of the iterator can be used, () can be used to read the next line of the file, the file iterations are more efficient, and it is easier to write (and read) such Python code.

5) Other

Close () ends the access to it by closing the file. The Python garbage collection mechanism also automatically closes files when the reference count of the file object drops to zero.

The Fileno () method returns a descriptor for the open file. This is an integer that can be used in some of the underlying operations, such as the OS Module ( ()).

Calling the flush () method directly writes the data in the internal buffer to the file immediately, rather than passively waiting for the output buffer to be written.

Isatty () is a Boolean built-in function that returns True when the file is a class TTY device, otherwise returns false.

The truncate () method intercepts the file to the current file pointer position or to a given size, in bytes.

6) File Method Miscellaneous

Core notes: line separators and other file system differences.

One of the differences between operating systems is that they support different row separators. On POSIX (Unix series or Mac OS X) systems, the line delimiter is the newline character NEWLINE (\ n) characters. The old MacOS is RETURN (\ r), and the DOS and WIND32 systems use both (\ r \ n).

Another difference is the path delimiter (POSIX uses "/", DOS and Windows use "\", older versions of MacOS use ":"), which separates the file path names, marks the current directory and the parent directory.

Python's OS module designers have helped us think about these issues. The OS module has five useful properties.

OS module properties that contribute to cross-platform development
OS Module Properties Describe


String used to separate rows in a file
Sep The string used to separate the file path name
Pathsep String used to separate the file path
CurDir String name of the current working directory
Pardir Parent directory string name (current working directory)

No matter what platform you are using, as long as you import the OS module, these variables will automatically be set to the correct values, reducing your hassle.

The print statement defaults to the end of the output with a newline character, and a comma after the statement avoids this behavior. The ReadLine () and ReadLines () functions do not handle any whitespace characters in the line, so you need to add commas. If you omit the comma, the displayed text will have two line breaks after each line, one of which is the input is included, and the print statement is added automatically.

The file object also has a truncate () method that accepts an optional size as a parameter. If given, then the file is truncated to a maximum size byte. If the size parameter is not passed, the default is to intercept the file's current location. For example, if you have just opened a file and immediately called the Truncate () method, then your file (content) is actually deleted, when you are actually intercepted from 0 bytes (Tell () will return this value ).

1 filename = input(‘Enter file name: ‘)
2 fobj = open(filename, ‘w‘)
3 while True:
4     aLine = input("Enter a line (‘.‘ to quit): ")
5     if aLine != ".":
6         fobj.write(‘%s%s‘ % (aLine, os.linesep)
7     else:
8         break
9 fobj.close()
View Code
1 >>> f = open (‘/ tmp / x‘, ‘w +‘)
  2 >>> f.tell ()
  3 0
  4 >>> f.write (‘test line 1 \ n‘) # Add a string of length 12 [0-11]
  5 >>> f.tell ()
  6 12
  7 >>> f.write (‘test line 2 \ n‘) # Add a string of length 12 [12-23]
  8 >>> f.tell () # tell us the current position
  9 24
10 >>> (-12, 1) # 12 bytes backward
11 >>> f.tell () # to the beginning of the second line
12 12
13 >>> f.readline ()
14 ‘test line 2 \ 012’
15 >>> (0, 0) # back to the beginning
16 >>> f.readline ()
17 ‘test line 1 \ 012’
18 >>> f.tell () # back to the second line
19 12
20 >>> f.readline ()
21 ‘test line 2 \ 012’
22 >>> f.tell () # to the end
23 24
24 >>> f.close () # close the file 
View Code (Off, whence=0): Move the off action marker (file pointer) from the file, move in the end direction, and move negative toward the beginning. If the whence parameter is set, the starting bit is set to whence, 0 is the starting point, 1 is the current position, and 2 represents the end of the file. [Note: Files need to be opened in B mode for file tail calculation]

List of built-in methods for file objects
Methods for File objects Operation
File.close () Close File
File.fileno () The descriptor of the returned files (file descriptor, FD, integer value)
File.flush () Flush the internal buffer of the file
File.isatty () Determine if file is a class TTY device ()

Returns the next line of the file (similar to File.readline ()), or throws a Stopiteration exception when there are no other rows (Size=-1)

Reads a size byte from a file, reads all remaining bytes when not given a size or a given negative value, and then returns as a string

File.readinto (Buf,size)

Read size bytes from file to BUF buffer (not supported)

File.readline (Size=-1) Reads from a file and returns one row (including line terminator), or returns the maximum size characters
File.readlines (sizhint=0) Reads all the lines of the file and returns it as a list (including all line terminators); If the given Sizhint is greater than 0, then the sum of approximately sizhint bytes will be returned (the size is determined by the next value of the buffer capacity) (for example, the buffer size can only be a multiple of 4K, if the Sizhint is 15k, then the last return may be 16k ——— Translator Press)
File.xreadlines () A more efficient method for iterating, which can replace ReadLines () (off,whence=0) Move the file pointer in the file, from whence (0 for file, 1 for current position, 2 for end of file) offset off byte
File.tell () Returns the current position in the file
File.truncate (Size=file.tell ()) Truncate file to maximum size byte, default to the current file location
File.write (str) Writing a string to a file
File.writelines (seq) Writes a string sequence seq to a file; The SEQ should be an iterative object that returns a string; Before 2.2, it was just a list of strings

4. File built-in properties

Properties of the File object
Properties of the File object Describe
File.closed True to indicate that the file has been closed or False
File.encoding Encoding used by the file-when Unicode strings are written to data, they are automatically converted to byte strings using File.encoding; Use system default encoding if File.encoding is None
File.mode Access mode used when the file is opened Filename
File.newlines None when the row delimiter is not read, only one row delimiter is a string, and when the file has more than one type of line terminator, it is a list that contains all the currently encountered line Terminators
File.softspace A 0 indicates that after outputting a data, a space character is added, and 1 is not added. This property is generally not used by programmers, but by internal programs.

5. Standard Documents

Generally, you can access three standard files as soon as your program executes. They are standard input (typically keyboard), standard output (buffered output to the display) and standard error (non-buffered output to the screen). (The "buffering" and "non-buffering" described here refer to the third argument of the open () function.) These files follow the names in the C language, stdin, stdout, and stderr, respectively.

The SYS module can be used to access the handles of these files in Python. After importing the SYS module, you can access it using Sys.stdin, Sys.stdout, and Sys.stderr. Print statements are usually output to sys.stdout; The built-in raw_input () typically accepts input from Sys.stdin.

Remember that sys.* is a file, so you have to handle the line break yourself. The print statement automatically adds a newline character to the string you want to output.

6. Command-line arguments

The SYS module provides access to command-line parameters through the SYS.ARGV property. A command-line argument is a parameter other than the program name when a program is called.

ARGC and argv represent the number of parameters (ArgumentCount) and the parameter vector (argument vector), respectively. The argv variable represents an array of strings that consist of each parameter entered from the command line; The ARGC variable represents the number of arguments entered. In Python, argc is actually the length of the SYS.ARGV list, and the first item of the list sys.argv[0] is always the name of the program. Summarized as follows:

?? SYS.ARGV is a list of command-line arguments
?? Len (SYS.ARGV) is the number of command-line arguments (that is, argc)

The Optparse module, introduced in Python 2.3, assists in processing command-line arguments, which are object-oriented.

7. File system

Access to file systems is mostly implemented through Python's OS modules. This module is the primary interface for Python to access the functionality of the operating system. The OS module is really just the front end of the module that actually loads, and the real "module" obviously relies on the specific operating system. This "real" module may be one of the following: POSIX (for Unix operating systems), NT (WIN32), Mac (old MacOS), DOS (Dos), OS2 (OS/2), etc. you do not need to import these modules directly. As soon as you import the OS module, Python chooses the right module for you, and you don't have to think about the underlying work.

file, directory Access functions for modules
Function Describe
File processing
Mkfifo ()/mknod () Create a named pipe/create a file system node
Remove ()/unlink () Delete file Deletes files
Rename ()/renames () Renaming files
*stat () Return file information
Symlink () Creating Symbolic Links
Utime () Update Time stamp
Tmpfile () Create and open (' W+b ') a new temporary file
Walk () Generate all file names under a directory tree
ChDir ()/fchdir () Change current working directory/change current working directory with a file descriptor
Chroot () Change the root directory of the current process
Listdir () List files for the specified directory
GETCWD ()/getcwdu () Returns the current working directory/function same, but returns a Unicode object
mkdir ()/makedirs () Create a directory/create a multi-level catalog
RmDir ()/removedirs () Delete directory/delete multi-level directory
Access () Verify permission Mode
chmod () Change permission Mode
Chown ()/lchown () Change owner and group ID/function same, but do not track links
Umask () Set Default permission Mode
File descriptor Operations
Open () Underlying operating system open (for files, use the standard built-in open () function)
Read ()/write () Read/write data based on file descriptor
DUP ()/dup2 () Copy file description symbol/function same, but copy to another file descriptor
Device number
Makedev () Create an original device number from the major and minor device numbers
Major ()/minor () Get the Major/minor device number from the original device number

The module Os.path can perform some operations on the path name. It provides functions to manage and manipulate the various parts of the file path name, get file or subdirectory information, file path queries, and so on.

Path name access functions in the Os.path module
Function Describe
BaseName () Remove directory path, return file name
DirName () Remove file name, return directory path
Join () Combine parts of the separation into one path name
Split () Returns (DirName (), basename ()) tuples
Splitdrive () Return (drivename, pathname) tuples
Splitext () Returns the (filename, extension) tuple
Getatime () Return last access time
Getctime () Return file creation time
Getmtime () Returns the last file modification time
GetSize () Returns the file size (in bytes)
Exists () Specifies whether the path (file or directory) exists
Isabs () Specifies whether the path is an absolute path
Isdir () Specifies whether the path exists and is a directory
Isfile () Specifies whether the path exists and is a file
Islink () Specifies whether the path exists and is a symbolic link
Ismount () Specifies whether the path exists and is a mount point
Samefile () Two path names pointing to the same file

core modules:OS (and Os.path). The OS and Os.path modules provide different ways to access the computer's file system.

8. Permanent Storage Module

Python provides a number of modules that can be used to minimize permanent storage. One of these groups (Marshal and pickle) can be used to convert and store Python objects. This process converts objects that are more complex than basic to a set of binary data, so that you can save the data collection or send it over the network, and then restore the data collection back to the original object format. This process is also called flattening of data, serialization of data, or sequencing of data. Other modules (DBHASH/BSDDB, dbm, gdbm, DUMBDBM, etc.) and their "manager" (ANYDBM) provide only a permanent storage of Python strings. The last module (shelve) has both functions.

Both the marshal and pickle modules can store and convert Python objects. These modules themselves do not provide "persistent storage" functionality because they do not provide namespaces for objects or provide concurrent write access to persistent storage objects (concurrent write access). They can only store converted Python objects, which is convenient for saving and transferring. Data storage is sequential (the storage and transmission of objects is carried out one after the other). The difference between the marshal and pickle modules is that marshal can only handle simple Python objects (numbers, sequences, mappings, and Code objects), while Pickle can also handle recursive objects, objects that are referenced multiple times in different places, and user-defined classes and instances. The Pickle module also has an enhanced version called Cpickle, which implements the relevant functionality using C.

*db* Series modules write data in the traditional dbm format, and Python offers a variety of implementations of dbm: Dbhash/bsddb, dbm, gdbm, and dumbdbm. If you're not sure, then it's best to use the ANYDBM module, which automatically detects the system Installed DBM compatible module and select "Best" Edit by Vheavensedit by Vheavens. The DUMBDBM module is the least functional, and ANYDBM chooses it when no other module is available.

The shelve module uses the ANYDBM module to find the appropriate DBM module and then uses the Cpickle to complete the storage conversion process. The shelve module allows concurrent read access to the database file, but does not allow shared read/write access.

core modules: Pickle and Cpickle

You can use the Pickle module to save Python objects directly into a file without having to convert them to strings, or write them into a binary file without the underlying file access operation. The Pickle module creates a Python-language-specific binary format that you do not need to consider for any file details, it will help you to complete the read and write object operation cleanly, the only need is a valid file handle.

The two main functions in the Pickle module are dump () and load (). The dump () function takes a file handle and a data object as a parameter, saving the data object in a specific format to a given file. When we use the load () function to remove a saved object from a file, pickle knows how to restore those objects to their original format. We recommend that you take a look at pickle and the more "smart" shelve module, which provides a dictionary-style file object access feature that further reduces the programmer's work.

9. Related Modules

File-related modules
Module Content
Base64 Provides encoding/decoding operations between binary strings and text strings
Binascii Provides encoding/decoding operation between binary and ASCII encoded binary strings
bz2 Accessing compressed files in the BZ2 format
Csv Accessing a CSV file (comma delimited file)
filecmp Used to compare directories and files
Fileinput A row iterator that provides multiple text files
Getopt/optparse Provides parsing/processing of command-line arguments
Glob/fnmatch Features that provide Unix-style wildcard matching
Gzip/zlib Read and write GNU zip (gzip) files (compression requires zlib module)
Shutil Provides advanced file access capabilities
C/stringio Providing a class file interface to a string object
Tarfile Read and write TAR archive files, support compressed files
Tempfile Create a temporary file (name)
Uu Encoding and decoding of formats
ZipFile Tools for reading ZIP archive files

The Fileinput module iterates through a set of input files, reading one line of their contents each time, similar to the "<>" operator without parameters in the Perl language. If a file name is not explicitly given, the file name is read from the command line by default. The Glob and Fnmatch modules provide pattern matching for old Unix shell style filenames, such as using the asterisk (*) wildcard character to represent any string, with a question mark (?). matches any single character.

Core tip: Use the tilde (~) of Os.path.expanduser () to extend

Although Glob and Fnmatch provide Unix-style pattern matching, they do not provide support for tilde (user directory) characters. You can use the Os.path.expanduser () function to perform this function, pass a directory with a tilde, and then return the corresponding absolute path. The Unix family system also supports the use of "~user", which represents the directory of the specified user. Also, note that the Win32 version function does not use backslashes to separate directory paths.

The gzip and zlib modules provide an interface for direct access to the Zlib compression library. The gzip module is written on the zlib module, not only for standard file access, but also for automatic gzip compression/decompression. BZ2 is similar to gzip for manipulating bZIP compressed files.

Python Core Programming (second Edition)--file and input/output

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.