Analyzes the mechanism of disabling file operations using Python scripts,

Source: Internet
Author: User

Analyzes the mechanism of disabling file operations using Python scripts,

If "with" is not used, when will Python close the file? The answer is: depending on the situation.

One of the first things that Python programmers learned is that they can easily traverse the full text of an open file through iterative methods:

f = open('/etc/passwd')for line in f:  print(line)

Note that the above Code is feasible, because our file object "f" is an iterator. In other words, "f" knows what to do in a loop or any other iteration context, such as list parsing.

Most students in my Python class have other programming languages, and they are always expected to close files when they use the languages they used to be familiar. So I was not surprised when I introduced them to Python file operations shortly after they asked how to close the file in Python.

The simplest answer is that we can explicitly close the file by calling f. close. Once we close the file, the file object still exists, but we can no longer read the file content through it, and the printable content returned by the file object also indicates that the file has been closed.

>>> f = open('/etc/passwd')>>> f<open file '/etc/passwd', mode 'r' at 0x10f023270>>>> f.read(5)'##n# ' f.close()>>> f<closed file '/etc/passwd', mode 'r' at 0x10f023270> f.read(5)---------------------------------------------------------------------------ValueError Traceback (most recent call last)<ipython-input-11-ef8add6ff846> in <module>()----> 1 f.read(5)ValueError: I/O operation on closed file

So, when I program using Python, I seldom explicitly call the "close" method to the file. In addition, you may not want to or do not have.

The best practice for opening a file is to use the "with" statement, as shown below:

with open('/etc/passwd') as f:  for line in f:    print(line)

The "with" Statement calls the "context manager" method in Python for the "f" file object. That is, it specifies "f" as a new file instance pointing to/etc/passwd content. In the code block opened by "with", the file is opened and can be read freely.

However, once the Python code exits from the "with" code segment, the file is automatically closed. Trying to read content from f after we exit the "with" code block will cause the same ValueError as above. Therefore, by using "with", you can avoid explicitly disabling file operations. Python will quietly close the file for you in a different Python style behind the scenes.

But what if you disable the file explicitly? What if you are a little lazy and do not use the "with" code block or call f. close? When will the file be closed? When should files be closed?

I asked this question because I have been teaching Python for so many years, and I am sure that the effort to teach "with" or context manager is beyond the scope of my students. When talking about "with" in the introductory course, I usually tell students to ask Python to close files when they encounter this problem during their career, no matter whether the application count of the file object is reduced to 0 or when Python exits.

In my Python file operations free e-mail course, I didn't use with in all solutions. I 'd like to see how. As a result, some people question me and say that "with" is not used to present a bad practice solution to people, and there is a risk that data is not written to the disk.

I received a lot of emails about this topic, so I asked myself: if we didn't explicitly close the file or didn't use the "with" code block, when will Python close the file? That is to say, what will happen if I have the file automatically closed?

I always assume that Python will close the file when the reference count of the object is reduced to 0, and then the garbage collection mechanism will clear the file object. It is difficult to prove or verify this when we read the file, but it is easy to write the file. This is because when a file is written, the content is not immediately refreshed to the disk (unless you pass "False" to the third optional parameter of the "open" method "), the file is refreshed only when it is closed.

So I decided to do some experiments to better understand what Python can do for me automatically. My experiments include opening a file, writing data, deleting references, and exiting Python. I'm curious about when the data will be written, if any.

My experiment looks like this:

f = open('/tmp/output', 'w')f.write('abcn')f.write('defn')# check contents of /tmp/output (1)del(f)# check contents of /tmp/output (2)# exit from Python# check contents of /tmp/output (3)

I did my first experiment with Python 2.7.9 on the Mac platform. The report shows that the phase 1 file exists but is empty. The phase 2 and phase 3 files contain all the content. In this way, in CPython 2.7, my initial intuition seems to be correct: When a file object is reclaimed by garbage collection, its _ del _ (or equivalent) method will refresh and close the file. In addition, when the "lsof" command is called in the IPython process, the file is indeed closed after the referenced object is removed.

What about Python3? I did the above experiments in the Python 3.4.2 environment on Mac and got the same results. After the final reference to the object is removed, the object will be refreshed and closed.

This is good for Python 2.7 and 3.4. But what is the alternative implementation under PyPy and Jython? The situation may be somewhat different.

So I did the same experiment in PyPy 2.7.8. This time, I got different results! After the file object is deleted-that is, in phase 2, the file content is not flushed into the disk. I have to assume that this is different from the garbage collection mechanism or other mechanisms that work in PyPy and CPython. However, if you run a program in PyPy, you should never expect that the file will be refreshed and closed just because the reference of the file object ends. The command lsof displays that the file will not be released until the Python process exits.

For fun, I decided to try Jython 2.7b3. as a result, Jython showed the same behavior as PyPy. That is to say, exiting from Python ensures that the data in the cache is written to the disk.

I retried these experiments, but I changed "abcn" and "defn" to "abcn" * 1000 and "defn" * 1000.

In the Python 2.7 environment, "abcn" * 1000 statement is not written after execution. However, after the "defn" * 1000 statement is executed, the package contains 4096 bytes, which may indicate the buffer size. When del (f) is called to delete a file object reference, data is flushed into the disk and the file is closed. At this time, the file contains 8000 bytes of data. Therefore, if the string size is ignored, Python 2.7 performs basically the same behavior. The only difference is that if the buffer size is exceeded, some data will be written to the disk before the last file is closed.

For Python 3, the situation is somewhat different. F. After the write operation, no data is written. However, once the file object reference is completed, the file will be refreshed and closed. This may be because of a large buffer. But there is no doubt that deleting a file object reference will refresh and close the file.

For PyPy and Jython, the operations on large and small files are the same: the file is refreshed and closed when the PyPy or Jython process ends, rather than when the file object reference ends.

To confirm again, I used "with" for an experiment. In all circumstances, we can easily predict when the file is refreshed and closed-that is, when the code segment is exited and the context manager calls the appropriate method in the background.

In other words, if you do not use "with", your data may not be lost, at least in a very simple situation. However, you still cannot determine whether the data is stored at the end of the file object reference or when the program exits. If you assume that the file will be closed when the function returns because the unique reference to the file is a local variable, the fact will surprise you. If you have multiple processes or threads simultaneously writing a file, you must be very careful.

Maybe this behavior can be better defined. Isn't it basically consistent on different platforms? Maybe we can even look at the beginning of the Python specification, instead of pointing to CPython and saying "Yeah, no matter how the version is always correct ".

I still think "with" and context manager are great. In addition, it is difficult to understand the working principle of "with" for beginners of Python. However, I still have to remind novice developers that if they decide to use other optional versions of Python, there will be many strange situations different from CPython, and if they are not careful enough, it is even affected.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.