Python memory management method and garbage collection algorithm parsing, python garbage collection

Source: Internet
Author: User

Python memory management method and garbage collection algorithm parsing, python garbage collection

Summary

There is a circular reference problem in the list, tuples, instances, classes, dictionaries, and functions. Instances with the _ del _ method will be processed in a sound way. It is easy to add GC support for new types. Python that supports GC is Binary compatible with conventional Python.

Generation-based recycling can run the job (currently three generations ). The result measured by pycharm is about 4% overhead. In fact, all the extension modules should still work normally (I have to modify the new and cPickle modules in the standard release ). A new module called gc can be used immediately to debug the recycler and set debugging options.

The recycler should be portable across platforms. Python patch versions pass all regression tests and run Grail, Idle, and Sketch without any problems.

The portable garbage collection mechanism is included in Python 2.0 and later versions. Garbage collection is enabled by default. Please be happy!

Why do we need garbage collection?

Currently, Python uses reference counting to manage allocated memory. Every object in Python has a reference count. This reference count shows how many objects point to it. When the reference count is 0, the object is released. Reference counting is good for most programs. However, the reference count has an essential defect because of cyclic reference. The simplest example of circular reference is an object that references itself. For example:

>>> l = []>>> l.append(l)>>> del l

The reference count of the created list is currently 1. However, because it is no longer accessible from inside Python and may no longer be used, it should be treated as garbage. In the current version of Python, this list will never be released.

In general, loop reference is not a good programming practice and should almost always be avoided. However, it is sometimes difficult to avoid making circular references, or the programmer is not even aware of the issue of circular references. This problem is especially troublesome for long-running programs, such as servers. People don't want their servers to use up memory because circular references cannot release inaccessible objects. For large programs, it is difficult to find out how circular references are created.

What is "traditional" garbage collection?

Conventional garbage collection (such as the Mark-clear method or stop-copy method) usually works as follows:

Find the root object of the system. The root object is like a global environment (such as the _ main _ module in Python) and an object on the stack.
Search for all accessible objects from these objects. These objects are "active.
Release all other objects.
Unfortunately, this method cannot be used in the current version of Python. Because of the working method of the extension module, Python cannot completely determine the root object set. If the root object set cannot be accurately determined, we have the risk of releasing objects that are still referenced. That is, the extension module is designed in other ways, and there is no portable way to find the objects on the current C stack. In addition, the reference count provides some benefits that Python programmers have already expected for local memory references and terminologies. It is best to find a way to use reference counting and release circular references.

How does this method work?

In terms of concept, this method is opposite to the traditional garbage collection mechanism. This method tries to find all inaccessible objects instead of all accessible objects. This is more secure, because if this algorithm fails, it will not be worse than not garbage collection (not considering the time and space we have wasted ).

Because we are still using reference counting, the garbage collector only needs to find circular references. The reference count will process other types of garbage. First, we observe that circular references can only be created by container objects. A container object is a reference object that can contain other objects. In Python, the list, Dictionary, instance, class, and ancestor are examples of container objects. The integer and string are not containers. Through this discovery, we realized that non-container objects can be ignored by garbage collection. This is a useful optimization because integers and strings should be brisk.

Now our idea is to record all container objects. There are several ways to do this. However, the best way is to use a two-way linked list. The object structure in the linked list contains pointer fields. In this way, the object can be quickly inserted and deleted from the set without the need for additional memory space allocation. When a container is created, it inserts the set and removes it from the set when it is deleted.

Since we can get all container objects, how can we find circular references? First, we add two fields outside the pointer to the container object. We name this field gc_refs. You can find the circular references in the following steps:

For each container object, set the value of gc_refs to the reference count of the object.
For each container object, find the other container objects it references and subtract one of their gc_refs values.
All gc_refs container objects greater than 1 are referenced by objects outside the container object set. We cannot release these objects, so we put these objects into another set.
The object referenced by the removed object cannot be released. We remove them and the objects they can access from the current set.
The remaining objects in the current set are referenced only by the objects in the Set (that is, they cannot be obtained by Python, that is, garbage ). Now we can release these objects.

Finalizer Problems

Another problem with our ambitious plan is the use of finalizer. Finalizer is the _ del _ method of the instance in Python. When reference count is used, Finalizer works well. When the reference count of an object is reduced to 0, Finalizer is called before the object is released. This is straightforward and easy to understand for programmers.

In the case of garbage collection, calling finalizer becomes a troublesome issue, especially in the case of loop reference. If both objects in the circular reference have finalizer, what should we do? Which one to call first? After the first finalizer is called, this object cannot be released because the second finalizer can still get it.

Because there is no good solution to this problem, the loops referenced by objects with finalizer cannot be released. On the contrary, these objects are added to a global garbage collection list. The program should always be re-compiled to avoid this problem. As the final means, the program can read the global list and release these reference loops in a way that makes sense to the current application.

What is the cost?

As some people have said, there is no free lunch in the sky. However, this form of garbage collection is quite cheap. One of the biggest costs is the memory space required for each container object with three additional words. There is also the overhead for maintaining the Container set. For the current version of the garbage collector, the overhead Based on pycharm is probably reduced by 4%.

The garbage collector currently records three generations of object information. By adjusting the parameters, the garbage collection time can be as small as possible. For some applications, it may be meaningful to turn off automatic garbage collection and explicitly call it at runtime. However, running pyro with the default garbage collection parameter does not seem to take much time for garbage collection. Obviously, applications that allocate a large number of container objects will cause more garbage collection time.

The current patch adds a new configuration item to activate the garbage collector. Python with a garbage collector is Binary compatible with standard Python. If this option is disabled, the operations of the Python interpreter will not be affected.

How can I use it?

You only need to download the current version of Python. The garbage collector is included in Versions later than 2.0 and is enabled by default. If you are using Python 1.5.2, there is a patch for the old version that may work. If you are using a Windows platform, you can download a replaced python15.dll.

Boehm-Demers conservative garbage collection

This patch adds some modifications to Python 1.5.2 to use Boehm-Demers for conservative garbage collection. But you must install this patch first. The reference count is still used. The garbage collector only releases the reference count but not the memory (that is, loop reference ). This should have the best performance. You need:

$ cd Python-1.5.2$ patch -p1 < ../gc-malloc-cleanup.diff$ patch -p1 < ../gc-boehm.diff$ autoconf$ ./configure --with-gc

This patch assumes that you have installed libgc. a to make the-lgc link option available (/usr/local/lib should also be available ). If you do not have this library, download and install it before compilation.

Currently, this patch has only been tested on Linux. It may also work on its Unix machine. On my Linux machine, the GC version of Python has passed all regression tests.

Summary

The above is all about the Python memory management method and garbage collection algorithm parsing in this article, and I hope to help you. If you are interested, you can continue to refer to this site: the Python algorithm outputs a list of all the operators with 100 results produced by arrays 1-9, and Python data structures and algorithms (linked list, linked list) simple implementation, the Python algorithm to calculate the number of different Binary Trees on n nodes, etc. If you have any questions, please feel free to leave a message. The editor will reply to you in time. Thank you for your support!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.