1. File objects
The file object can be used not only to access normal disk files, but also to access "files" on any other type of abstraction level. Once the appropriate "hooks" are set, you can access other objects that have the file type interface as if you were accessing a normal file. The file is just a sequential sequence of bytes. Data transfers often use byte streams, regardless of whether the byte stream consists of a single or large chunk of data.
2. The file built-in function "open () and" () "
The built-in function open () [and file ()] provides a common interface for initializing input/output (I/O) operations as a "key" to open the door of a file.
The open () built-in function returns a file object when the file is successfully opened, or an error is raised. When the operation fails, Python produces a IOError exception. The basic syntax for the built-in function open () is:
File_object = open (file_name, access_mode= ' R ', Buffering=-1)
File_name is a string containing the name of the file to be opened, which can be a relative path or an absolute path. Optional variable Access_mode is also a string that represents the mode in which the file opens. Typically, files are opened using the pattern ' R ', ' W ', or ' a ' mode, respectively, representing read, write, and append. There is also a ' U ' mode, which represents universal line break support. Files opened using the ' R ' or ' U ' mode must already exist. Files opened with ' W ' mode are emptied first if they exist, and then (re) created. Files opened in ' A ' mode are prepared for append data, and all written data is appended to the end of the file. Even if you seek somewhere else. If the file does not exist, it will be created automatically, similar to opening the file in ' W ' mode.
If there is no given access_mode, it will automatically take the default value of ' R '. Another optional parameter, buffering, is used to indicate the buffering method used to access the file. where 0 means no buffering, 1 means that only one row of data is buffered, and any other value greater than 1 represents a buffer size using the given value. This parameter is not provided, or a given negative value represents the use of the system default buffering mechanism, which uses a row buffer for any class Telegraph (TTY) device, and other devices use normal buffering. In general, you can use the system default mode.
Access mode for file objects |
File mode |
Operation |
R |
Open in read mode |
RU or Ua |
Read-Open with Universal line feed support (PEP 278) |
W |
Open as write (empty if necessary) |
A |
Open in Append mode (starting with EOF and creating a new file if necessary) |
r+ |
Open in read-write mode |
w+ |
Open in read-write mode (see W) |
A + |
Open in read-write mode (see a) |
Rb |
Open in binary read mode |
Wb |
Open in binary write mode (see W) |
Ab |
Open in binary append mode (see a) |
rb+ |
Open in binary read/write mode (see r+) |
wb+ |
Open in binary read/write mode (see w+) |
ab+ |
Open in binary read/write mode (see A +) |
A |
New in Python 2.3 |
The open () and file () functions have the same functionality and can be arbitrarily replaced. Generally speaking, we recommend using open () to read and write files.
Universal line break Support (UNS): When you open a file with the ' U ' flag, all line breaks (or line terminators, whatever it is) are replaced with a newline character newline (\ n) when returned via Python's Input method (for example, read* ()). (' RU ' mode also supports ' RB ' option). This feature also supports files that contain different types of line terminators. The Newlines property of the file object records the line terminator of the file it had "seen".
Note UNS is only used to read text files, there is no corresponding method to process the output of the file.
When compiling Python, UNS is turned on by default. If you don't need this feature, you can use the--without-universal-newlines switch to turn it off when you run the Configure script.
3. Document Building methods
After open () executes successfully and returns a file object, all subsequent operations on the file will be made through this handle. File methods can be divided into four categories: input, output, move within files, and miscellaneous operations.
1) input
The read () method is used to read bytes directly into a string, up to a given number of bytes. If no size parameter is given (the default is-1) or the size value is negative, the file is read until the end.
The ReadLine () method reads a line of open files (all bytes before the next line terminator is read). Then the entire row, including the line terminator, is returned as a string. It also has an optional size parameter, which defaults to-1, which represents reading to the line terminator. If this argument is provided, incomplete rows are returned after the size of a word.
The ReadLines () method does not return a string like the other two input methods. It reads all (the remaining) rows and returns them as a list of strings. Its optional parameter, Sizhint, represents the maximum size of bytes returned. If it is greater than 0, then all rows returned should have approximately sizhint bytes (perhaps slightly larger than this number because the buffer size needs to be pooled).
2) output
The Write () built-in method writes a string containing text data or binary data blocks to a file.
The Writelines () method is a list-based operation that takes a list of strings as parameters and writes them to a file. The line terminator is not automatically added, so if necessary, you must add a line terminator to the end of each line before calling Writelines ().
Core Note: preserves line separators. When reading rows from a file using an input method such as read () or ReadLines (), Python does not delete the line terminator. This operation was left to the programmer. The output method write () or writelines () does not automatically join the line terminator. You should do it yourself before writing the data to the file.
3) Move within file
The Seek () method can move a file pointer to a different location in a file. The offset byte represents an offset relative to a position. The default value for the location is 0, which represents the beginning of the file (that is, the absolute offset), 1 representing the current position, and 2 representing the end of the file.
The text () method is a supplement to seek (); It tells you where the current file pointer is in the file-from the beginning of the file, in bytes.
4) file Iteration
With the introduction of iterators and file iterations in Python 2.2, the file objects become their own iterators, and the next method of the iterator can be used, file.next () can be used to read the next line of the file, the file iterations are more efficient, and it is easier to write (and read) such Python code.
5) Other
Close () ends the access to it by closing the file. The Python garbage collection mechanism also automatically closes files when the reference count of the file object drops to zero.
The Fileno () method returns a descriptor for the open file. This is an integer that can be used in some of the underlying operations, such as the OS Module (Os.read ()).
Calling the flush () method directly writes the data in the internal buffer to the file immediately, rather than passively waiting for the output buffer to be written.
Isatty () is a Boolean built-in function that returns True when the file is a class TTY device, otherwise returns false.
The truncate () method intercepts the file to the current file pointer position or to a given size, in bytes.
6) File Method Miscellaneous
Core notes: line separators and other file system differences.
One of the differences between operating systems is that they support different row separators. On POSIX (Unix series or Mac OS X) systems, the line delimiter is the newline character NEWLINE (\ n) characters. The old MacOS is RETURN (\ r), and the DOS and WIND32 systems use both (\ r \ n).
Another difference is the path delimiter (POSIX uses "/", DOS and Windows use "\", older versions of MacOS use ":"), which separates the file path names, marks the current directory and the parent directory.
Python's OS module designers have helped us think about these issues. The OS module has five useful properties.
OS module properties that contribute to cross-platform development |
OS Module Properties |
Describe |
Linesep |
String used to separate rows in a file |
Sep |
The string used to separate the file path name |
Pathsep |
String used to separate the file path |
CurDir |
String name of the current working directory |
Pardir |
Parent directory string name (current working directory) |
No matter what platform you are using, as long as you import the OS module, these variables will automatically be set to the correct values, reducing your hassle.
The print statement defaults to the end of the output with a newline character, and a comma after the statement avoids this behavior. The ReadLine () and ReadLines () functions do not handle any whitespace characters in the line, so you need to add commas. If you omit the comma, the displayed text will have two line breaks after each line, one of which is the input is included, and the print statement is added automatically.
The file object also has a truncate () method that accepts an optional size as a parameter. If given, then the file is truncated to a maximum size byte. If the size parameter is not passed, the default is to intercept the file's current location. For example, if you have just opened a file and immediately called the Truncate () method, then your file (content) is actually deleted, when you are actually intercepted from 0 bytes (Tell () will return this value ).
1 filename = input(‘Enter file name: ‘)
2 fobj = open(filename, ‘w‘)
3 while True:
4 aLine = input("Enter a line (‘.‘ to quit): ")
5 if aLine != ".":
6 fobj.write(‘%s%s‘ % (aLine, os.linesep)
7 else:
8 break
9 fobj.close()
View Code
1 >>> f = open (‘/ tmp / x‘, ‘w +‘)
2 >>> f.tell ()
3 0
4 >>> f.write (‘test line 1 \ n‘) # Add a string of length 12 [0-11]
5 >>> f.tell ()
6 12
7 >>> f.write (‘test line 2 \ n‘) # Add a string of length 12 [12-23]
8 >>> f.tell () # tell us the current position
9 24
10 >>> f.seek (-12, 1) # 12 bytes backward
11 >>> f.tell () # to the beginning of the second line
12 12
13 >>> f.readline ()
14 ‘test line 2 \ 012’
15 >>> f.seek (0, 0) # back to the beginning
16 >>> f.readline ()
17 ‘test line 1 \ 012’
18 >>> f.tell () # back to the second line
19 12
20 >>> f.readline ()
21 ‘test line 2 \ 012’
22 >>> f.tell () # to the end
23 24
24 >>> f.close () # close the file
View Code
File.seek (Off, whence=0): Move the off action marker (file pointer) from the file, move in the end direction, and move negative toward the beginning. If the whence parameter is set, the starting bit is set to whence, 0 is the starting point, 1 is the current position, and 2 represents the end of the file. [Note: Files need to be opened in B mode for file tail calculation]
List of built-in methods for file objects |
Methods for File objects |
Operation |
File.close () |
Close File |
File.fileno () |
The descriptor of the returned files (file descriptor, FD, integer value) |
File.flush () |
Flush the internal buffer of the file |
File.isatty () |
Determine if file is a class TTY device |
File.next () |
Returns the next line of the file (similar to File.readline ()), or throws a Stopiteration exception when there are no other rows |
File.read (Size=-1) |
Reads a size byte from a file, reads all remaining bytes when not given a size or a given negative value, and then returns as a string |
File.readinto (Buf,size) |
Read size bytes from file to BUF buffer (not supported) |
File.readline (Size=-1) |
Reads from a file and returns one row (including line terminator), or returns the maximum size characters |
File.readlines (sizhint=0) |
Reads all the lines of the file and returns it as a list (including all line terminators); If the given Sizhint is greater than 0, then the sum of approximately sizhint bytes will be returned (the size is determined by the next value of the buffer capacity) (for example, the buffer size can only be a multiple of 4K, if the Sizhint is 15k, then the last return may be 16k ——— Translator Press) |
File.xreadlines () |
A more efficient method for iterating, which can replace ReadLines () |
File.seek (off,whence=0) |
Move the file pointer in the file, from whence (0 for file, 1 for current position, 2 for end of file) offset off byte |
File.tell () |
Returns the current position in the file |
File.truncate (Size=file.tell ()) |
Truncate file to maximum size byte, default to the current file location |
File.write (str) |
Writing a string to a file |
File.writelines (seq) |
Writes a string sequence seq to a file; The SEQ should be an iterative object that returns a string; Before 2.2, it was just a list of strings |
4. File built-in properties
Properties of the File object |
Properties of the File object |
Describe |
File.closed |
True to indicate that the file has been closed or False |
File.encoding |
Encoding used by the file-when Unicode strings are written to data, they are automatically converted to byte strings using File.encoding; Use system default encoding if File.encoding is None |
File.mode |
Access mode used when the file is opened |
File.name |
Filename |
File.newlines |
None when the row delimiter is not read, only one row delimiter is a string, and when the file has more than one type of line terminator, it is a list that contains all the currently encountered line Terminators |
File.softspace |
A 0 indicates that after outputting a data, a space character is added, and 1 is not added. This property is generally not used by programmers, but by internal programs. |
5. Standard Documents
Generally, you can access three standard files as soon as your program executes. They are standard input (typically keyboard), standard output (buffered output to the display) and standard error (non-buffered output to the screen). (The "buffering" and "non-buffering" described here refer to the third argument of the open () function.) These files follow the names in the C language, stdin, stdout, and stderr, respectively.
The SYS module can be used to access the handles of these files in Python. After importing the SYS module, you can access it using Sys.stdin, Sys.stdout, and Sys.stderr. Print statements are usually output to sys.stdout; The built-in raw_input () typically accepts input from Sys.stdin.
Remember that sys.* is a file, so you have to handle the line break yourself. The print statement automatically adds a newline character to the string you want to output.
6. Command-line arguments
The SYS module provides access to command-line parameters through the SYS.ARGV property. A command-line argument is a parameter other than the program name when a program is called.
ARGC and argv represent the number of parameters (ArgumentCount) and the parameter vector (argument vector), respectively. The argv variable represents an array of strings that consist of each parameter entered from the command line; The ARGC variable represents the number of arguments entered. In Python, argc is actually the length of the SYS.ARGV list, and the first item of the list sys.argv[0] is always the name of the program. Summarized as follows:
?? SYS.ARGV is a list of command-line arguments
?? Len (SYS.ARGV) is the number of command-line arguments (that is, argc)
The Optparse module, introduced in Python 2.3, assists in processing command-line arguments, which are object-oriented.
7. File system
Access to file systems is mostly implemented through Python's OS modules. This module is the primary interface for Python to access the functionality of the operating system. The OS module is really just the front end of the module that actually loads, and the real "module" obviously relies on the specific operating system. This "real" module may be one of the following: POSIX (for Unix operating systems), NT (WIN32), Mac (old MacOS), DOS (Dos), OS2 (OS/2), etc. you do not need to import these modules directly. As soon as you import the OS module, Python chooses the right module for you, and you don't have to think about the underlying work.
file, directory Access functions for modules |
Function |
Describe |
File processing |
Mkfifo ()/mknod () |
Create a named pipe/create a file system node |
Remove ()/unlink () |
Delete file Deletes files |
Rename ()/renames () |
Renaming files |
*stat () |
Return file information |
Symlink () |
Creating Symbolic Links |
Utime () |
Update Time stamp |
Tmpfile () |
Create and open (' W+b ') a new temporary file |
Walk () |
Generate all file names under a directory tree |
Directory/Folder |
ChDir ()/fchdir () |
Change current working directory/change current working directory with a file descriptor |
Chroot () |
Change the root directory of the current process |
Listdir () |
List files for the specified directory |
GETCWD ()/getcwdu () |
Returns the current working directory/function same, but returns a Unicode object |
mkdir ()/makedirs () |
Create a directory/create a multi-level catalog |
RmDir ()/removedirs () |
Delete directory/delete multi-level directory |
Access/Permissions |
Access () |
Verify permission Mode |
chmod () |
Change permission Mode |
Chown ()/lchown () |
Change owner and group ID/function same, but do not track links |
Umask () |
Set Default permission Mode |
File descriptor Operations |
Open () |
Underlying operating system open (for files, use the standard built-in open () function) |
Read ()/write () |
Read/write data based on file descriptor |
DUP ()/dup2 () |
Copy file description symbol/function same, but copy to another file descriptor |
Device number |
Makedev () |
Create an original device number from the major and minor device numbers |
Major ()/minor () |
Get the Major/minor device number from the original device number |
The module Os.path can perform some operations on the path name. It provides functions to manage and manipulate the various parts of the file path name, get file or subdirectory information, file path queries, and so on.
Path name access functions in the Os.path module |
Function |
Describe |
Separated |
BaseName () |
Remove directory path, return file name |
DirName () |
Remove file name, return directory path |
Join () |
Combine parts of the separation into one path name |
Split () |
Returns (DirName (), basename ()) tuples |
Splitdrive () |
Return (drivename, pathname) tuples |
Splitext () |
Returns the (filename, extension) tuple |
Information |
Getatime () |
Return last access time |
Getctime () |
Return file creation time |
Getmtime () |
Returns the last file modification time |
GetSize () |
Returns the file size (in bytes) |
Inquire |
Exists () |
Specifies whether the path (file or directory) exists |
Isabs () |
Specifies whether the path is an absolute path |
Isdir () |
Specifies whether the path exists and is a directory |
Isfile () |
Specifies whether the path exists and is a file |
Islink () |
Specifies whether the path exists and is a symbolic link |
Ismount () |
Specifies whether the path exists and is a mount point |
Samefile () |
Two path names pointing to the same file |
core modules:OS (and Os.path). The OS and Os.path modules provide different ways to access the computer's file system.
8. Permanent Storage Module
Python provides a number of modules that can be used to minimize permanent storage. One of these groups (Marshal and pickle) can be used to convert and store Python objects. This process converts objects that are more complex than basic to a set of binary data, so that you can save the data collection or send it over the network, and then restore the data collection back to the original object format. This process is also called flattening of data, serialization of data, or sequencing of data. Other modules (DBHASH/BSDDB, dbm, gdbm, DUMBDBM, etc.) and their "manager" (ANYDBM) provide only a permanent storage of Python strings. The last module (shelve) has both functions.
Both the marshal and pickle modules can store and convert Python objects. These modules themselves do not provide "persistent storage" functionality because they do not provide namespaces for objects or provide concurrent write access to persistent storage objects (concurrent write access). They can only store converted Python objects, which is convenient for saving and transferring. Data storage is sequential (the storage and transmission of objects is carried out one after the other). The difference between the marshal and pickle modules is that marshal can only handle simple Python objects (numbers, sequences, mappings, and Code objects), while Pickle can also handle recursive objects, objects that are referenced multiple times in different places, and user-defined classes and instances. The Pickle module also has an enhanced version called Cpickle, which implements the relevant functionality using C.
*db* Series modules write data in the traditional dbm format, and Python offers a variety of implementations of dbm: Dbhash/bsddb, dbm, gdbm, and dumbdbm. If you're not sure, then it's best to use the ANYDBM module, which automatically detects the system Installed DBM compatible module and select "Best" Edit by Vheavensedit by Vheavens. The DUMBDBM module is the least functional, and ANYDBM chooses it when no other module is available.
The shelve module uses the ANYDBM module to find the appropriate DBM module and then uses the Cpickle to complete the storage conversion process. The shelve module allows concurrent read access to the database file, but does not allow shared read/write access.
core modules: Pickle and Cpickle
You can use the Pickle module to save Python objects directly into a file without having to convert them to strings, or write them into a binary file without the underlying file access operation. The Pickle module creates a Python-language-specific binary format that you do not need to consider for any file details, it will help you to complete the read and write object operation cleanly, the only need is a valid file handle.
The two main functions in the Pickle module are dump () and load (). The dump () function takes a file handle and a data object as a parameter, saving the data object in a specific format to a given file. When we use the load () function to remove a saved object from a file, pickle knows how to restore those objects to their original format. We recommend that you take a look at pickle and the more "smart" shelve module, which provides a dictionary-style file object access feature that further reduces the programmer's work.
9. Related Modules
File-related modules |
Module |
Content |
Base64 |
Provides encoding/decoding operations between binary strings and text strings |
Binascii |
Provides encoding/decoding operation between binary and ASCII encoded binary strings |
bz2 |
Accessing compressed files in the BZ2 format |
Csv |
Accessing a CSV file (comma delimited file) |
filecmp |
Used to compare directories and files |
Fileinput |
A row iterator that provides multiple text files |
Getopt/optparse |
Provides parsing/processing of command-line arguments |
Glob/fnmatch |
Features that provide Unix-style wildcard matching |
Gzip/zlib |
Read and write GNU zip (gzip) files (compression requires zlib module) |
Shutil |
Provides advanced file access capabilities |
C/stringio |
Providing a class file interface to a string object |
Tarfile |
Read and write TAR archive files, support compressed files |
Tempfile |
Create a temporary file (name) |
Uu |
Encoding and decoding of formats |
ZipFile |
Tools for reading ZIP archive files |
The Fileinput module iterates through a set of input files, reading one line of their contents each time, similar to the "<>" operator without parameters in the Perl language. If a file name is not explicitly given, the file name is read from the command line by default. The Glob and Fnmatch modules provide pattern matching for old Unix shell style filenames, such as using the asterisk (*) wildcard character to represent any string, with a question mark (?). matches any single character.
Core tip: Use the tilde (~) of Os.path.expanduser () to extend
Although Glob and Fnmatch provide Unix-style pattern matching, they do not provide support for tilde (user directory) characters. You can use the Os.path.expanduser () function to perform this function, pass a directory with a tilde, and then return the corresponding absolute path. The Unix family system also supports the use of "~user", which represents the directory of the specified user. Also, note that the Win32 version function does not use backslashes to separate directory paths.
The gzip and zlib modules provide an interface for direct access to the Zlib compression library. The gzip module is written on the zlib module, not only for standard file access, but also for automatic gzip compression/decompression. BZ2 is similar to gzip for manipulating bZIP compressed files.
Python Core Programming (second Edition)--file and input/output