Python-detailed introduction to file processing

Source: Internet
Author: User
This article describes how to process python files. in python, if you want to operate a file on a hard disk, you can perform the following three steps:

Use the open function to open a file handle and assign a value to a variable.

Use the corresponding file handle to operate the specified file.

Close the file after the operation is complete. after the file is closed, the file content will be written to the disk.

The following describes how to use open functions.

Open ('file path', mode = 'File opening mode', encoding = 'File encoding mode ')

File path: This file path can be an absolute or relative path. in python, you only need to write the file name, if the python program and the file to be opened are in a directory, you can directly use the relative path.

Note! If you want to use the open function, the file path is a required parameter and cannot be passed!

File opening mode: The open function contains an optional parameter, namely mode, which defines the file opening mode. if the file opening mode is not specified, r (read-only) is used by default) open the file.

Common file opening methods in python are as follows:

'R' open the file in read-only mode and open the file in r (read-only mode). the file can only be read and cannot be written. (If the file does not exist, an exception is thrown)

'W' open the file in write-only mode and open the file in w (write-only mode). the file can only be written and cannot be read, but pay special attention to it !!!!! A file with content already exists. once opened with w, the content of this file will be cleared !! (As for the reason, this article will be added later .) (If you do not want the original file content to be cleared !! Do not use w mode !!!!!!!) (If the object does not exist, a file is created. if the object exists, the file content is cleared first)

The 'A' append mode is also a write-only mode. it is used to append content to the end of the file.

'B' binary mode. use the binary mode to open the file. Note! This 'B' (binary mode) is used in combination with the (r, w, a) mode. (This mode is recommended for cross-platform and cross-operating systems)

'R + 'is readable and writable (in this mode, although it is readable and writable, you must note that the seek pointer is still in the file header, if you do not adjust the position of the seek pointer to start writing directly, it will directly overwrite the previously written content (starting from the file header). Therefore, when you use the r + mode, be sure to pay attention to the position of the seek pointer, where is the file !! Otherwise, the original content will be overwritten .)

'W + 'can be written and readable (this mode is generally not used and files are directly cleared)

Appended to the end of 'A + ', writable and readable

I. common methods for operations on file objects

Read files:

Readable () is used to determine whether the object is readable. if it is readable, True is returned; otherwise, False is returned.

Readline () reads a row of a file at a time and returns the string type.

Read () reads all the content of the file at a time and returns an entire string.

Readlines () reads all the content of the file and adds each row of the file to a list. each row of the file will be used as an element in the list.

Write file:

Writable (): determines whether a file can be written. if it can be written, True is returned. otherwise, False is returned.

Write (): write content into the file. this method can be used only when the file is in writable mode. the specific position of the file to be written is, it depends on the file opening mode (r +, a +, or w +) and the location of the file pointed by the current seek pointer. (In addition, if you use the write method to write content to the file, there is no line break. you must manually add a line break. otherwise, all the content will be glued together .)

Example: f1.write ('Hello! \ N') # \ n is a line break.

Writelines (): Similar to wirte, writelines writes content to the file. Unlike write, writelines writes content to the file in the form of a list, using the writelines method, python will loop the list, and every element in the list will be written to the file.

Note! When using writelines to write content inside the file, there is no line break. if you add a line break to the end of each element, each element in the list is a line in the file.

Note !! The content written in the file can only be a string, not other types !! Otherwise, an exception will be thrown. even if you want to write a number, the number must be converted to the string type !!!

Other operations:

Close () close the file. use close to close the file after reading or writing it! (Except for the with syntax, because the with keyword is used to open a file, the file is automatically closed after the operation on the file ends ).

After reading the file, the program will continue to occupy system resources.

If you do not close the file after writing it, the content in the memory will not be synchronized to the hard disk in time. if you want to completely write the content to the hard disk, use close to close the file. otherwise, use the flush method to forcibly fl the data in the memory to the hard disk.

Flush () clears the data that is not written to the hard disk in the memory to the hard disk.

Encoding: Display the file opening encoding (this method is not available in python2 and can be used in python3 .)

Tell (): obtains the position of the current seek pointer.

Seek (pointer position, pattern) moves the seek pointer in bytes and the position in the file.

In pyrhon, there are three modes for seek pointer operations. The following describes the three modes in detail:

File. seek (n, 0) # n represents the pointer position, and the number 0 after it represents the pattern sequence number.

File. seek (n, 0): (mode 0) mode 0 indicates the absolute position. if n is the number, the pointer is moved to the nth byte starting from the beginning of the file. (If the mode is not specified when seek is used, the default mode is 0 .)

For example, file. seek (0th) moves the pointer to the position of three bytes starting from the beginning of the file (bytes.

# Before using pointers

F1 = open ('Seasons. lrc ', mode = 'r ')

Print f1.readline ()

>>> Miyazaki zookeeper-Seasons

# The following uses the 0 mode to move the pointer to the first byte of the object. after the pointer is moved to the second byte, the object will be read after the pointer.

F1 = open('seasons.txt ', mode = 'r ')

Print f1.tell () # display the current position of the seek pointer

>>> 0 # (the position is 0, indicating that the pointer is at the beginning of the file)

F1.seek (3, 0) # (move the pointer to the third byte, in the 0 mode, absolute position)

Print f1.tell () # Check the pointer position again to verify that the pointer position is indeed moved to the third byte.

>>> 3

Print f1.readline () # read a row from the end of the current pointer.

>>> Kawasaki zookeeper-Seasons

At this time, someone may ask, reading a file is correct starting from behind the pointer, but it is clear that the pointer has been moved to three bytes. Why did it skip a character?

This requires understanding what are characters and bytes. you must be clear about this concept !! In UTF-8 encoding, a Chinese character occupies three bytes. The first line of the original line is "あゆ-Seasons", because a Chinese character occupies three bytes, therefore, the pointer is moved three bytes backward, which is exactly the position of a Chinese character. the pointer is moved to the back of the "bang". if you read a file, you can start to read it from the back of the word, therefore, the following figure shows "Kawasaki zookeeper-Seasons ". (A Japanese character also occupies three bytes in UTF-8 encoding .)

The following example shows the meaning of "absolute position.

F1 = open ('Seasons. lrc ', mode = 'r ')

Print f1.tell () # After opening the file, the default position of the pointer is 0.

>>> 0

F1.seek () # moves to three bytes in the file.

Print f1.tell ()

>>> 3

F1.seek (3, 0)

Print f1.tell ()

>>> 3

File. seek (n, 1): (Mode 1) Mode 1, relative position, n represents the pointer moving several bytes backward at the current position.

If you think that I am not very easy to understand, I guess you will understand the following examples.

F1 = open ('Seasons. lrc ', mode = 'r ')

Print f1.tell () # After opening the file, the default position of the pointer is 0.

>>> 0

F1.seek () # move the pointer back to the position of three bytes.

Print f1.tell ()

>>> 3 # the pointer is moved to the position of 3rd bytes.

F1.seek () # Here is the focus. move the pointer three bytes backward (here we can compare the difference between Mode 1 and mode 0 .)

Print f1.tell ()

>>> 6 # the pointer is located at 6th bytes. This indicates that the 1 mode is not counted from the beginning of the file, instead, it starts to move backward based on the position where the pointer was last located. (This is the meaning of "relative position .)

If you still don't understand it, you can add it as follows.

At last, I want to add:

F1.seek (3rd) means to move the pointer to the position of the file's bytes. (Absolute position)

F1.seek () refers to the position where the pointer is moved three bytes backward from the current position. (Relative location)

File. seek (-n, 2): It is an absolute position, starting from the end of the file and moving to the beginning of the file. (When using the 2 mode, note that the position where the pointer is moved can only be negative, because it is moving forward from the end !)

The following is an example:

# Open a file and use the read method to read the end of the file from the beginning. when the file is read, the pointer will naturally move back to the end of the file.

F1 = open('seasons.txt ', mode = 'r ')

Print f1.tell ()

>>> 0

F1.read ()

Print f1.tell ()

>>> 756

Now we know through the above method that the end of the file is the number of bytes of the file.

Next, we will test whether the 2 mode function of the seek method is as mentioned earlier, starting from the end of the file and moving to the beginning of the file.

F1 = open('seasons.txt ', mode = 'r ')

Print f1.tell ()

>>> 0

F1.seek (-) # use the 2 mode of seek to move the pointer one byte forward from the end of the file

Print f1.tell ()

>>> 755

# The end of the file is 756, and the forward byte is 755, which gets the desired effect.

In fact, the two modes of the seek method are quite useful. let's take a closer look at the two modes of the seek method.

You can use the 2 mode of the seek method to obtain the last 1st rows of the file to the nth row.

At this time, someone may ask, it is not very easy to retrieve the last row of the file. as long as you use the readlines method to read all the files and then take the last element, you can get the last row, isn't this method simpler?

Like this.

F1 = open('seasons.txt ', mode = 'r ')

Print f1.readlines () [-1]

>>> Why? There are already too many other

The last row of the file is indeed taken out, but have you ever wondered whether the essence of reading the file using readlines is to read all the lines of the file into the memory. if the file is very large, for example, 10 GB, 100 GB, which is as large as the current memory cannot be placed. if this is the case, this method is not applicable.

The following method applies to large files.

F1 = open('seasons.txt ', mode = 'r ')

For I in f1: # The Direct for loop file handle does not read all files into the memory at a time, but reads a row from the file and takes a row.

Line_bytes =-36

While True:

F1.seek (line_bytes, 2)

Data = f1.readlines ()

If len (data)> 1:

Print data [-1]

Break

Else:

Line_bytes * 2

The above is a detailed introduction to python file processing. For more information, see other related articles in the first PHP community!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.