Python Programming (ii): Python process, thread that little thing

Last Update:2016-02-19 Source: Internet

Author: User

Tags memcached posix

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

650) this.width=650; "title=" 3814a842e65ad657cde7955da3a8b215_b.png "src=" http://s2.51cto.com/wyfs02/M02/7B/13/ Wkiol1bgizwik3fuaae-ryece6w240.png "alt=" Wkiol1bgizwik3fuaae-ryece6w240.png "/>

Multi-process, multithreaded programming

System programmers, devops developers are often asked a common question during an interview:

What is a process, what is a thread, and what is the difference between a process and a thread?

Have to admit, for so many years. This problem is still a very difficult question, simply say:

Processes and threads have a lot of similar properties, they can be scheduled by the CPU as a unit, they all have their own independent stack (stack) and so on. So threads are also called LWP (lightweight process, lightweight processes), and corresponding processes can also be called HWP (Heavyweight process, heavyweight processes), and from a thread perspective, a process is a process that has only one thread. If a process has multiple threads, it can perform multiple tasks at the same time (it is possible for the SMT system to really "simultaneously"). Their similarities and differences can be discussed from the following aspects:

Scheduling

In the traditional computer operation system, the basic unit of CPU dispatch is the process. Later, the operating system generally introduced the concept of threading, the thread becomes the basic unit of CPU scheduling, the process can only be used as the basic unit of resources.

Parallel

As a result of threading, a process could have only one concurrency, and now a process can have multiple threads executing in parallel. Many of the earlier HTTP servers used threading to resolve the concurrency of the server, which was a few times higher than the previous process of processing concurrency with a fork subprocess. All this is due to the thread being able to implement concurrency at a lower cost to the process.

shared

Linux threads generally inherit or share the following resources:

The process code snippet, as shown in
650) this.width=650; "title=" Fd51b8365d1f03f70536a88e0a97aff9_r.jpg.png "alt=" Wkiol1bgih-hiwp5aahew1zx3ry298.png " src= "Http://s5.51cto.com/wyfs02/M00/7B/13/wKioL1bGiH-hiWP5AAHeW1zx3rY298.png"/>
The public data segment memory of the process (using these shared data, threads can easily communicate with each other)
Process open FD (file descriptor, filename descriptor)
The processor of the signal
Process user ID (UID) and process group ID (pgid)

Isolation

Linux threads will have the following resources (unshared) independently:

Thread ID, in Linux threads and process shared ID space, in UNIX system threads ID is and process ID at different levels of concept
The value of the register, which is actually the most necessary guarantee that the thread can act as an independent dispatch unit
Thread stack, which is the guarantee that threads can run in parallel
Priority, the Linux system design allows threads and processes to be virtually non-discriminatory except for the differences in the sharing & isolation of certain resources, so that they can have different priorities.

The generation of multi-process multithreading and its status in Linux system

Linux since the beginning of the positioning is a multitasking operating system, from Linus Torvalds write the first version of the time when the concept of the process. For example, the PID of our familiar init process is 1.

Threads are created to solve concurrency problems, and thread positioning is also a lightweight process.

The Linux kernel does not have a thread concept until the 2.6 release, and the minimum scheduling unit for a task is a process. But Linux in the design of the introduction of the thread to create a good condition, Linux, the famous start of the new process system call fork is through the kernel call clone implementation of the copy address space and other resources. Linux creates threads simply by altering the kernel's parameters to invoke clone. So, from the point of view of modern operating system kernel scheduling, the difference between process and thread is negligible.

Unfortunately, the early kernel versions of Linux are not fully compatible with the threading mechanism and POSIX standards that are increased by minor modifications, especially in signal processing, scheduling, and cross-process synchronization behavior.

To advance the unification of the Linux Threads and POSIX standards, the two people made a lot of effort: IBM-led NGPT (Next Generation POSIX Threads) and Red Hat (NPTL POSIX) Thread Library). The competition ended in a NPTL victory, and NPTL's user-state API is the Pthread series API we now use. This Red Hat war against IBM also basically established the former in the Linux industry to carry the role of the handle.

Before NPTL became the POSIX fact standard for Linux, the UNIX system, led by FreeBSD, maintained its performance advantage over Linux. This has led to many older companies that used FreeBSD rather than Linux.

Why not blindly solve concurrency problems with the thread

As mentioned above, the emergence of threads is to solve the growing demands of the Linux system for concurrent programming.

But just as the headline in our section says, "You can't blindly solve concurrency problems."

This is due to the cost of context switch: When the computer is still in the single-core era, it has a multitasking operating system. But a single-core CPU can only run one command of a process at a time. In order to achieve the "multitasking" that the user wants to run at the same time (for example, I am tapping this text, the background is still running itunes playing music, there is a thunderbolt in my virtual machine running). Linux by cutting the CPU time into different sizes of time slices, through the kernel scheduling algorithm, let them take turns to occupy the precious CPU resources. Since the size of the switching time slices is usually microseconds, the computer is running "multitasking" in our view.

Context Switch

A program if run to his time slice end has not finished his work, then, sorry, please store the data you need to save (usually some CPU registers) in memory, and then queued up.

What, wait a minute? No,no,no This is not a user-state process that can bargain with the kernel.

Saving the site is a cost, and more serious, it will greatly affect the CPU's branch prediction and affect the performance of the system. So context switch is what we're trying to avoid.

If the process or thread is more open, it will cause the context switch to increase and seriously affect the performance of the system.

Therefore: "Can not blindly solve the concurrency problem".

Introduction to the Process (coroutine)

Brilliantly speaking, the process is the user's own control of the multi-tasking stack, as far as possible to not allow the process due to external interruption or IO waiting for the loss of CPU scheduled time slice, so that concurrency within the process.

In order to alleviate the high-concurrency connection, Linux introduced a co-process at an early time to mitigate the performance loss caused by context switching and to implement asynchronous programming to some extent. However, since the program is too obscure to understand, it has not been popular even before the process was introduced to the Linux kernel earlier.

Here is a description of Wikipedia for the process, you can refer to:

By 2003, many of the most popular programming languages, including C and his successors, were not supported directly within the language or in their standard libraries. (This is largely limited by the implementation of the stack-based subroutine).

In some cases, it is natural to use the implementation strategy of the co-process, but it is not possible to use the co-process in this environment. A typical workaround is to create a subroutine that maintains an internal state between calls with a collection of Boolean flags and other state variables. Conditional statements based on the values of these state variables produce different execution paths and subsequent function calls in the code. Another typical solution is to implement an explicit state machine with a large and complex switch statement. This realization is difficult to understand and maintain.

In today's mainstream programming environment, threads are the right alternative to the process, and threads provide the ability to manage the real-time interaction of code snippets that are executed at the same time. Because to solve a lot of difficult problems, threads include many powerful and complex functions and lead to difficult learning curves. Using threads is too tricky when all you need is a co-process. However-unlike other alternatives-in the context of supporting C, threads are also widely available, familiar to many programmers, and well implemented, documented and supported. In POSIX there is a standard good definition of thread implementation pthread.

But in recent years, Golang's efforts seem to have made this ancient mechanism a sign of recovery.

Memory layout When the program runs

First, we need to understand a basic knowledge: the memory of the program runtime, that is, the memory address we can see in the user state, is not the address in physical memory. Modern operating systems do a memory mapping on the physical memory. So, for example, each process's memory space is independent, 0x8000 this address in the physical address is actually a different address.

650) this.width=650; "title=" 4476ab6ffe6fcf0ce7391f5457aca09c_b.jpg "alt=" Wkiol1bginfjvumpaaiugiwnubi381.jpg "src= "Http://s1.51cto.com/wyfs02/M02/7B/13/wKioL1bGiNfjvUMPAAIUGIWnUBI381.jpg"/>

For example, each thread has its own stand-alone "stack", "register", and "Thread counter". Each process can have multiple threads. Threads in the same process can share memory space.

multi-process and multi-threaded selection Scenarios

In Linux system programming, multi-process and multi-threading have their own application.

In most cases they are chosen according to their characteristics, the most important of which is the above mentioned "sharing", "isolation".

Let us give you an example:

We are familiar with the memcached, is a typical multithreaded programming. One reason why he is multi-threaded, rather than multi-process, is that multiple threads of memcached need to share key-value data in memory. So multithreading is an inevitable choice.

Then is the famous Nginx, is a typical multi-process programming. Since the HTTP requests to be processed by Nginx are relatively independent, there is not much data that needs to be shared. What's more, Nginx needs to support the feature "Restart Server service", which can be implemented in a multi-process framework.

So, a conclusion is: In the end is a multi-process good, or multithreading needs to be based on business scenarios to analyze the choice.

Python's Gil

Gil is the abbreviation for global interpreter lock. As the name implies, is a global lock of the Python interpreter. It is due to the Python interpreter in the implementation of the author in order to "rough fast" to achieve a prototype introduced a lot of global variables, due to the existence of global variables will be locked, in order to lock that simply pound, add a global lock bar ... Well, that should be the case.

Later python became popular, and many of the module's authors were also trying to simplify the problem, on the other hand, because the Python interpreter itself had the Gil, and many of the modules themselves introduced many global variables in their own way.

Since then the Python Gil has embarked on a path of no return, and the effect on Python programmers is that Python's multithreading can only have one thread running at a time. Multi-threaded case is the thread constantly in the grab lock, Rob was badly beaten.

This David Beazley article is a very profound exposition of the Python Gil and its performance implications:

Global Interpretor Lock

This part of the content we will be in the class to do more in-depth discussion.

We can get the conclusion that:

The multithreading of Python is counterproductive to CPU-intensive and can be used for IO-intensive
Python multi-process can take advantage of multicore CPUs

==========================================

Welcome Attention reboot Education python actual Combat Class (March 5 classes)

Course Details click: http://www.51reboot.com/course/actual/

>>> more technical exchanges, please Dabigatran: 238757010

This article is from the "Reboot DevOps Development" blog, please be sure to keep this source http://opsdev.blog.51cto.com/2180875/1743281

Python Programming (ii): Python process, thread that little thing

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More