Python multithreading (1)

Source: Internet
Author: User

 

Developing multi-threaded application systems is a common requirement in daily software development. The current programming language provides good support for multi-threaded development, whether through library support or built-in multi-threaded mechanism in the language. Python also provides good support for the development of multi-threaded systems.

As a dynamic language, Ruby also provides multi-threaded support. However, the multithreading mechanism before Ruby 1.9 simulates the thread and thread scheduling mechanism in the implementation of the language, the thread mechanism of the operating system is not used (in the future description, we will call it a native thread ). Ruby 1.9 integrates yarv as a new Ruby virtual machine. In yarv, native threads of the operating system are introduced into Ruby. Every Ruby thread is a thread in the operating system. In Ruby, a global resource lock is maintained. A ruby thread must first obtain this lock to become an active thread, to use the global resources of the Ruby virtual machine.

All of this has been implemented in Python. the threads in Python are the native threads of the operating system from the very beginning, and the python virtual machine also uses a global interpreter lock (Gil) to use the python virtual machine.

15.1 Gil and Thread Scheduling

To understand why Python needs global interpreter lock (Gil), consider the following situation: Suppose there are two threads A and B, in the two threads, both store references to the same object in the memory. That is to say, the value of obj-> ob_refcnt is 2. If a destroys the reference to OBJ, it is clear that a will adjust the reference count value of OBJ through py_decref. We know that the entire action of py_decref can be divided into two parts:

--obj->ob_refcnt;
if(obj->ob_refcnt == 0) destory object and free memory。

After executing the first action, if a changes the value of obj-> ob_refcnt to 1. Unfortunately, at this point, the thread scheduling mechanism suspends a and wakes B Up. Unfortunately, B also began to destroy references to OBJ. After B completes the first action, obj-> ob_refcnt is 0, and B is a lucky one. It is successfully completed without being interrupted by thread scheduling, destroys the object and releases the memory. Well, now a is awakened again. Unfortunately, it is already a matter of fact. obj-> ob_refcnt has been reduced to 0 by B instead of 1. According to the Conventions, silly a starts to destroy the destroyed objects and release the memory again. What is the end? Only the day knows.

To support multithreading, a basic requirement is to implement mutual exclusion between different threads for shared resource access. python is no exception, which is the root cause of Gil introduction. Gil in python is a very domineering mutex implementation. As its name implies, Gil is an interpreter-to echo the interpreter in Gil, in this chapter, we will refer to the interpreter as a virtual machine-level mutex mechanism. That is to say, after a thread has access to the interpreter, all other threads must wait for it to release the interpreter's access, even if the next instruction of these threads does not affect each other. It seems that this protection mechanism is too granular. We only need to protect resources that may be shared by multiple threads. for resources that will not be shared by multiple threads, no protection is required. In fact, such a solution exists in the history of Python development, but it is surprising that, this solution is not as efficient as implementing multiple threads on a single processor as it is as good as the Gil solution. Therefore, the multithreading mechanism in python is implemented on the basis of Gil.

Of course, this solution also means that at the same time, only one thread can access the APIS provided by python. Note that the same time here is meaningless for a single processor, because the nature of a single processor cannot be parallel, but for a multi-processor, the situation is completely different. At the same time, it is true that multiple threads can run independently. However, the Gil of Python limits this situation, causing the multi-processor to eventually degrade to a single processor, thus compromising the performance. In fact, this point has already been known by the python community and has been extensively explored. About 99 years ago, Greg Stein and Mark Hammond created a branch to remove Gil Based on Python 1.5, but unfortunately this branch was used in many benchmark tests, in particular, in the test of single-threaded operations, the efficiency is only about half of that of Python using Gil. The fine-grained locking mechanism can lead to a large number of locking and unlocking operations. Locking and unlocking are a heavyweight action for the operating system. On the other hand, without the protection of Gil, writing Python extension modules is much more difficult. Therefore, up to the latest version of Python 2.5, Gil is still the cornerstone of the multi-threaded mechanism, and we still focus our attention on the single processor. In fact, in the Mail List of Python 3000 in February, Guido, the creator of Python, proposed a feasible solution. In the case of multi-processor, you can create multiple Python processes and fully use them to communicate with each other through IPC. Of course, Guido just put forward such an idea and did not reveal much implementation details.

Figure 15-1 shows a rough model we have created for the multi-threaded mechanism of Python.

Figure 15-1 rough model of the python thread Mechanism

From the previous analysis, we know that for python, the bytecode interpreter is the core of Python, so python uses Gil to mutex different threads to use the interpreter. In Figure 15-1, three anthropomorphic threads A, B, and C both need to use the interpreter to execute bytecode for some computing, but before that, they must obtain Gil because Gil keeps the door to the bytecode interpreter. After a thread (a) obtains Gil, the other two threads (B, c) can only wait for a to release Gil before entering the interpreter and executing some calculations. In fact, Gil of Python protects not only the interpreter of Python, but also the c api of Python. In the hybrid development of C/C ++ and python, gil is also required for mutual exclusion between native and Python threads. We will discuss this in detail later.

So when will a release Gil? If Gil is released after a has used the interpreter, it means that parallel computing degrades to serial computing. What is the significance of such a multithreading mechanism? There is no doubt that python has a thread scheduling mechanism.

For the thread scheduling mechanism, like the process scheduling in the operating system, the most important thing is to solve two problems:

When will the current thread be suspended and the next thread in the waiting state be selected?

Among the many pending candidate threads, which thread should be activated?

In the multi-thread mechanism of Python, these two problems are solved by different layers. The question of when to schedule a thread is determined by Python itself. Consider how the operating system performs process switching. When a process is executed for a period of time, the clock is interrupted and the operating system responds to the clock interruption, at this time, process scheduling is started. Similarly, in Python, the software simulates such clock interruptions to activate thread scheduling. We know that the working principle of the python bytecode interpreter is to execute it one by one in the order of commands. The Python internally maintains a value, which is the internal clock of Python, if the value is N, it means that python should immediately start the thread scheduling mechanism after running n commands. Figure 15-2 shows how to obtain the default value set in Python.

Figure 15-2"
Interval of clock interruption

The result shown in Figure 15-2 means that in the current 2.5, the default behavior of python is to start the thread scheduling mechanism after 100 commands are executed. In fact, this value is not only used for thread scheduling, but also used internally by python to check whether asynchronous events occur and must be processed. We can adjust this value through SYS. setcheckinterval.

Now we know when Python controls thread scheduling. When a thread obtains the necessary Gil to access the python interpreter and enters the python interpreter, the internal monitoring mechanism of Python starts to start. After this thread executes 100 commands, the python interpreter will force the current thread to suspend and start switching to the next waiting process.

So which one of the many waiting threads will Python choose? The answer is, I don't know. That's right. For this problem, Python did not intervene at all, but was handed over to the underlying operating system. That is to say, python uses the thread scheduling mechanism provided by the underlying operating system to determine the next thread to enter the python interpreter.

This is crucial. This means that the threads in Python are actually the native threads supported by the operating system, rather than the python threads, which are not native threads, it is simulated. The multithreading mechanism in python is based on the native thread of the operating system and has different implementations for different operating systems, based on different native threads, Python provides a unified abstraction mechanism to give Python users a simple and convenient multi-thread toolbox, which is two modules in Python: thread and threading on it.

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.