Python File Reading Summary
Python File Reading Summary
You want to read text or data from a file using python.
1. The most convenient method is to read all the content in the file at a time and place it in a large string:
All_the_text = open('thefile.txt '). read () # all text in the text file
All_the_data = open ('abinfile', 'rb'). read () # All data in the binary file
For the sake of security, it is best to specify a name for the opened file object, so that after the operation is completed, the file can be quickly closed to prevent some useless file objects from occupying the memory. For example, read a text file:
File_object = open('thefile.txt ')
Try:
All_the_text = file_object.read ()
Finally:
File_object.close ()
The Try/finally statement does not have to be used here, but it works better because it can ensure that the file object is closed even if a serious error occurs during reading.
2. The simplest, fastest, and most Python-style method is to read the content of a text file line by line and place the read data in a string list:
List_of_all_the_lines = file_object.readlines ()
Each line of text read in this way carries the "\ n" symbol at the end. If you do not want this, there is another alternative, such:
List_of_all_the_lines = file_object.read (). splitlines ()
List_of_all_the_lines = file_object.read (). split ('\ n ')
List_of_all_the_lines = [L. rstrip ('\ n') for L in file_object]
The simplest and fastest way to process text files line by line is to use a simple for loop statement:
For line in file_object:
Process line
This method also leaves the "\ n" symbol at the end of each line. You can add one sentence to the body of the for Loop:
Lineline = line. rstrip ('\ n ')
Or, you want to remove the blank characters (not just '\ n' \) at the end of each line. The common method is:
Lineline = line. rstrip ()
Iii. Discussion
Unless the file to be read is huge, it is the fastest and most convenient way to read all the content in the memory and further process it. The built-in function open creates a Python file object (you can also call the built-in type file to create a file object ). You call the read method for this object to read all the content (whether text or binary data) and put it in a large string. If the content is text, you can use the split method or a dedicated splitlines to split it into a row list. Because splitting strings to a single line is a common requirement, you can also directly call readlines for file objects for easier and faster processing.
You can directly apply a circular statement to a file object or pass it to a processor who needs to iterate the object, such as list or max. When it is processed as an iteratable object, each text line in an opened and Read File object is converted into an iteration Subitem (therefore, this applies only to text files ). This row-by-row processing method saves a lot of memory resources and provides a good speed.
In UNIX or UNIX-like systems, such as Linux, Mac OS X, or other BSD variants, there is no difference between text files and binary files. In Windows and the old Macintosh systems, line breaks are '\ r \ n' and' \ R', instead of standard '\ n '. Python will help you convert these line breaks into '\ n '. This means that when you open a binary file, you need to explicitly tell Python so that it will not do any conversion. For this purpose, 'rb' must be passed to the second parameter of open. On the UNLX platform, this does not have any harm, and it is always a good habit to distinguish between text files and binary files. Of course, this is not a mandatory requirement on those platforms. But these good habits will make your program more readable, easier to understand, and better platform compatibility.
If you are not sure what line breaks a text file will use, you can set the second parameter of open to 'ru 'and specify the conversion of common line breaks. This allows you to freely exchange files on Windows, UNIX (including Mac OS X), and other old Macintosh platforms without worrying about any issues: regardless of the platform on which your code runs, various linefeeds are mapped to '\ n '.
You can directly call the read method for the file objects generated by the open function, as shown in the first code snippet in the solution. When you do this, you also lose reference to the file object while reading it. In practice, Python notices that the file will be quickly closed if the file is instantly out of reference on the spot. However, a better way is to specify a name for the results generated by open, so that you can close the file explicitly after processing is completed. This ensures that the file is open for as short as possible, even on Jython, IronPython, or other variant Python platforms (the advanced garbage collection mechanism of these platforms may delay automatic garbage collection, unlike the current C-based Python platform, CPython will immediately recycle it ). To ensure that the file object can still be properly closed even if an error occurs during the processing, try/finally statements should be used, which is a robust and rigorous processing method.
File_object = open('thefile.txt ')
Try:
For line in file_object:
Process line
Finally:
File_object.close ()
Note: Do not put open calls into try clauses of try/finally statements (this is a common mistake for beginners ). If an error occurs when the file is opened, nothing needs to be closed and nothing substantive is bound to the name of file_object, of course, you should not call file_object.close ().
If you choose to read a small part of the file at a time, instead of all, the method is a bit different. The following example shows how to read 100 bytes of a binary file at a time and keep reading the end Of the file:
File_object = open ('abinfile', 'rb ')
Try:
While True:
Chunk = file_object.read (100)
If not chunk:
Break
Do_something_with (chunk)
Finally:
File_object.close ()
Input a parameter N to the read method to ensure that the read method only removes N Bytes (or less, if the read position is very close to the end of the file ). When the end of the file is reached, read returns an empty string. It is best to encapsulate complex loops into reusable generators ). For this example, we can only encapsulate a part of its logic, because the yield keyword of the generator (generator) is not allowed to appear in the try clause of the try/finally statement. To discard the try/finally statement to disable the file protection, we can do this:
Def read_file_by_chunks (filename, chunksize = 100 ):
File_object = open (filename, 'rb ')
While True:
Chunk = file_object.read (chunksize)
If not chunk:
Break
Yield chunk
File_object.close ()
Once the read_file_by_chunks generator is complete, the code for reading and processing binary files with a fixed length can be extremely simple:
For chunk in read_file_by_chunks ('abinfile '):
Do_something_with (chunk)
It is more common to read text files row by row. You only need to apply the following statement to the file object:
For line in open('thefile.txt ', 'ru '):
Do_something_with (line)
To 100% ensure that no useless opened file objects exist after the operation is completed, you can make the above Code more rigorous and stable:
File_object = open('thefile.txt ', 'ru '):
Try:
For line in file_object:
Do_something_with (line)
Finally:
File_object.close ()