A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service
One of the things that Python programmers have learned in the first place is that they can easily iterate through the full text of an open file through iterative methods:
f = open ('/etc/passwd ') for line in F: print (line)
Note that the above code is feasible because our file object "F" is an iterator. In other words, "F" knows what to do in the context of a loop or any other iteration, such as a list parsing.
Most of the students in my Python class have other programming language backgrounds, and when using previously familiar languages, they are always expected to close files when they are finished with file operations. So I was not surprised when I introduced them to the Python file operation shortly after they asked how to close the file in Python.
The simplest answer is that we can explicitly close the file by calling F.close (). Once we close the file, the file object still exists, but we can no longer read the contents of the file through it, and the printable content returned by the file object also indicates that the file has been closed.
>>> f = open ('/etc/passwd ') >>> f >>> f.read (5) ' # #n # ' F.close () >>> f< c3/> f.read (5)---------------------------------------------------------------------------ValueError Traceback (recent call last) in ()----> 1 f.read (5) VALUEERROR:I/O operation on Closed file
So, when I'm programming with Python, I rarely explicitly call the "close" method on a file. In addition, you probably don't want or need to do that.
The preferred best practice for opening a file is to use the "with" statement, as shown below:
With open ('/etc/passwd ') as F: For line in F: print (line)
The "with" statement invokes a method called "context Manager" in Python for the "F" file object. That is, it specifies "F" as a new file instance that points to the/etc/passwd content. Within a block of code that is open with, the file is open and can be read freely.
However, once the Python code exits from the code snippet that is responsible for "with", the file is automatically closed. Attempting to read content from F after we exit the "with" block causes the same ValueError exception as above. So, by using "with", you avoid explicitly closing the file operation. Python will magically and silently close the file for you in a less Python-style way.
But what happens if you don't explicitly close the file? What if you are a bit lazy and do not use the "with" code block or call F.close ()? When will the file be closed? When should I close a file?
The reason I'm asking this is because I've been teaching python for so many years that I'm convinced that trying to teach "with" or the context manager while teaching a lot of other topics is beyond the scope of student acceptance. When talking about "with" in introductory courses, I usually tell students that when they encounter this problem in their careers, let Python close the file, regardless of whether the file object's application count drops to 0 or when Python exits.
In my python file operation free e-mail course, I did not use with in all the solutions and wanted to see how. As a result some people question me, saying that not using "with" will show people a bad practice and there is a risk that data will not be written to disk.
I received a lot of emails about this topic, so I asked myself: if we don't explicitly close the file or use the "with" block, when does python close the file? In other words, what happens if I let the file shut down automatically?
I always assume that when the reference count of an object falls to 0 o'clock, Python closes the file, and the garbage collection mechanism cleans up the file object. When we read a document it is difficult to prove or verify this, but it is easy to write to the file. This is because when the file is written, the content is not immediately flushed to disk (unless you pass in "False" to the third optional parameter of the "open" method) and is refreshed only when the file is closed.
So I decided to do some experiments to get a better understanding of what Python can do automatically for me. My experiments include opening a file, writing data, deleting references, and exiting Python. I'm curious as to when the data will be written, if any.
My experiment is like this:
f = open ('/tmp/output ', ' W ') f.write (' ABCN ') f.write (' Defn ') # Check contents of/tmp/output (1) del (f) # Check contents Of/tm P/output (2) # exit from python# Check contents Of/tmp/output (3)
I did the first experiment on the Mac platform with Python 2.7.9, and the report shows that the stage one file exists but is empty, and that the files in phase two and phase three contain all the content. Thus, my initial intuition in CPython 2.7 seems to be correct: When a file object is garbage collected, its __del__ (or equivalent) method refreshes and closes the file. And calling the "lsof" command in my Ipython process shows that the file is actually closed after the reference object has been removed.
What about the Python3? I did the above experiment in Python 3.4.2 environment on Mac and got the same result. Removing the last reference to a file object causes the file to be refreshed and closed.
This is good for Python 2.7 and 3.4. But what about alternative implementations under PyPy and Jython? Maybe the situation will be different.
So I did the same experiment under the PyPy 2.7.8. And this time, I got a different result! Deleting a reference to a file object-that is, Phase 2-does not cause the file contents to be brushed into the disk. I have to assume that this is related to different garbage collection mechanisms or other mechanisms that work in PyPy and CPython. But if you run the program in PyPy, you never expect the file to be flushed and closed just because the file object's reference ends. The command lsof shows that the file will not be released until the python process exits.
For fun, I decided to try Jython 2.7b3. As a result, Jython showed the same behavior as PyPy. That is, quitting from Python does make sure that the data in the cache is written to disk.
I re-made these experiments, but I replaced "ABCN" and "Defn" with "ABCN" *1000 and "Defn" *1000.
In the Python 2.7 environment, nothing is written after the "ABCN" * 1000 statement executes. However, after the "defn" * 1000 statement executes, the file contains 4,096 bytes--which may represent the size of the buffer. Calling del (f) to delete a reference to the file object causes the data to be brushed into the disk and the file is closed, at which time there are 8000 bytes of data in the file. So Python 2.7 behaves essentially the same as ignoring the string size. The only difference is that if the size of the buffer is exceeded, some data will be written to disk before the last file is closed for data refresh.
In the case of Python 3, it's a little different. No data is written after the F.write executes. However, once the file object reference is finished, the file is refreshed and closed. This is probably because of the large buffer. There is no doubt, however, that deleting a file object reference causes the file to be refreshed and closed.
As for PyPy and Jython, the result is the same for both large and small files: The file is refreshed and closed at the end of the PyPy or Jython process, rather than at the end of the file object's reference.
In order to reconfirm, I used "with" to experiment. In all cases, we can easily predict when a file is refreshed and closed-that is, when the code snippet is exited, and the context manager calls the appropriate method in the background.
In other words, if you do not use "with", at least in very simple cases, your data may not necessarily be in danger of being lost. However, you are still not sure whether the data is saved at the end of the file object reference or when the program exits. If you assume that because the unique reference to the file is a local variable so the file is closed when the function returns, the fact will surely surprise you. If you have multiple processes or threads writing to a file at the same time, you really need to be very careful.
Perhaps this behavior can be better defined not to be able to behave on different platforms basically consistent? Perhaps we can even look at the beginning of the Python specification, rather than pointing to CPython saying "Yeah, regardless of how the version is always right."
I still think the "with" and the context Manager are great. And I think it's hard to understand how "with" works for novice python. But I still have to remind the novice developers that if they decide to use the other alternative versions of Python, there will be a lot of quirks that are different from CPython and if they are not careful, they may even suffer.
Start building with 50+ products and up to 12 months usage for Elastic Compute Service