Python face question (vii)

Last Update:2018-08-02 Source: Internet

Author: User

Tags add time epoll readable

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1 What is LAN, WAN, metropolitan area network?

① LAN LAN (local area network): Generally refers to the coverage within 10 km, a building or a unit within the network. Because the transmission distance directly affects the transmission speed, therefore, the communication within the local area network, because the transmission distance is short, the transmission rate is generally relatively high. At present, the transmission rate of LAN can reach 10mb/s and 100mb/s, and the transmission rate of high speed LAN can reach 1000mb/s. ② Wan Wan (Wide area Network): Refers to long-distance, large-scale computer network. Cross-regional, cross-city, cross-country networks are all WANs. The wide area coverage, the network of computers, so the amount of information on the WAN is very large, the sharing of resources is rich. The internet is the world's largest wide area network, covering the entire world. ③ Man (Metropolitan area Network): its coverage is between LAN and WAN. Generally refers to a network that covers a city.

2 What is a sticky bag? What is the cause of the sticky packet in the socket? What happens when a sticky bag occurs?

Socket sticky pack:??

?? When the socket interacts with send, a sticky packet will appear when multiple sends are processed consecutively, and Soket will force two send as a send, sticking together.

?? Send sends a fixed value based on the value defined by the recv, and if the last time the remaining value is less than the RECV definition, it will be sent with two send data simultaneously, and the sticky packet condition occurs.

Solution:

?? Scenario 1: You can use Time.sleep to add time between two send (not recommended)

?? Scenario 2: You can add a CONN.RECV (1024) between the send two strips??

Service side

conn.send(str(len(cmd_res.encode())).encode("utf-8")) client_ack = conn.recv(1024) #wait client to confirmconn.send(cmd_res.encode("utf-8"))

Client Client.send ("Ready to receive, loser can be sent". Encode ("Utf-8"))

Principle: Recv When receiving data is a blockage state, automatic card in the middle,?? The customer will automatically return Client_ack data information, equivalent to two?? There is a single interaction between send and there is no sticky packet situation.

    方案3：通过if判断实现粘包解决（推荐）

while 总数值 > 递增接收数据:   if 总数值 - 递增接收数据 > recv(1024): #比定义值大就成立       size = 1024   else: #最后一次 size = 总数值 - 递增接收数据 #剩的数值 recv(size) #赋值给 recv

3 What is the role of IO multiplexing?

What is a file descriptor we all know that in the Unix (like) world, everything is a file, and what is a file? File is a string of binary flow just, regardless of socket, or FIFO, pipeline, terminal, for us, everything is a file, everything is a stream. In the process of exchanging information, we send and receive the data of these streams, referred to as I/O operation (input and output), read the data in the stream, the system calls read, writes the data, and the system calls write. But then again, there are so many streams in the computer, how do I know which stream to operate on? Yes, it is the file descriptor, which is commonly referred to as FD, an FD is an integer, so, the operation of this integer is the operation of this file (stream). We create a socket that returns a file descriptor through the system call, and the rest of the socket operation is translated into the operation of this descriptor. It must be said that this is a layered and abstract thinking.

Blocking? What is the blocking of the program? Imagine this situation, such as you wait for the courier, but the courier has not come, what will you do? There are two ways of doing this:

Courier did not come, I can go to bed first, and then express to me to call me to fetch on the line.

Express did not come, I will not stop to the express call said: "Rub, how not to come, to Lao Zi quickly, until express."

Obviously, you can't stand the second way, not only to delay your own time, but also let the courier very much want to hit you. In the computer world, these two scenarios correspond to blocking and non-blocking busy polling.

Non-blocking busy polling: When the data is not coming, the process keeps checking the data until the data comes.

Blocking: Data does not come, do nothing, until the data came, before the next processing.

Let's talk about blocking, because one thread can handle only one socket I/O event, and if you want to handle more than one, you can take advantage of the non-blocking busy polling method, the pseudo-code

while true:   for i in stream[]:       if i has data: read until unavailable

We can handle multiple streams by querying all the streams from start to finish, but this is not good because if all the streams have no I/O events, the CPU time slice is wasted. As one scientist has said, all computer problems can be solved by adding an intermediate layer, again, in order to avoid this CPU idling, we do not let this thread in person to check whether there are events in the stream, but instead introduced a proxy (a Select, then poll), this agent is very cow, It can observe the I/O events of many streams at the same time, if there is no event, the agent will block, the thread will not go round to poll, the pseudo code is as follows:

while true:   select(streams[]) //这一步死在这里，知道有一个流有I/O事件时，才往下执行   for i in streams[]:       if i has data read until unavailable

But there's still a problem, and we just know from select that there is an I/O event, but I don't know which stream (there may be one, multiple, or even all), we can only poll all the streams without differences, find a stream that can read the data, or write the data and manipulate them. So select has an O (n) non-differential polling complexity, and the more streams are processed, the longer the non-differential polling time.

Epoll can be understood as event poll, unlike busy polling and non-differential polling, the Epoll will notify us of which stream I/O event occurred. So we say that Epoll is actually event-driven (each event is associated with FD), and we have a sense of how these flows are manipulated at this point. (The complexity is reduced to the O (1)) pseudo-code as follows:

while true:   active_stream[] = epoll_wait(epollfd)   for i in active_stream[]:       read or write till

As you can see, the biggest difference between select and Epoll is that select just tells you a certain number of flows have events, as to which flow has an event, you have to poll one by one, and Epoll will tell you what happened, through the event, it will automatically navigate to which stream. Cannot but say Epoll and select compared, is qualitative leap, I think this is also a kind of sacrifice space, in exchange for the idea of time, after all, hardware is getting cheaper now.

I/O multiplexing

Well, we've talked so much, and then, to summarize, what exactly is I/O multiplexing. Let's start with the I/O Model: First, the input operation typically consists of two steps:

Wait until the data is ready (waiting. For operations on a set of interfaces, this step relates to the data arriving from the network and copying it to a buffer in the kernel.

Copy the data from the kernel buffer to the process buffer (copying the The data from the kernel-the process).

Next, look at the 3 commonly used I/O models:

The most extensive model of the blocking I/O model is the blocking I/O model, by default, all the socket interfaces are blocked. The process calls the Recvfrom system call, and the entire process is blocked until the data is copied to the process buffer (of course, the system call is interrupted and returned).

Non-blocking I/O model when we set a socket to non-blocking, we are telling the kernel not to sleep the process when the requested I/O operation cannot be completed, but instead to return an error. When the data is not ready, the kernel immediately returns a Ewouldblock error, and when the system call is called again, the data already exists, and the data is copied into the process buffer. This one has an action-time poll (polling).

I/O multiplexing model This model uses the Select and poll functions, which also cause the process to block, select first blocks, there is an active socket to return, but unlike blocking I/O, these two functions can block multiple I/O operations at the same time, and can simultaneously multiple read operations, Multiple write I/O functions are detected until the data is readable or writable (that is, listening to multiple sockets). When select is called, the process is blocked, the kernel monitors all the select-responsible sockets, and when any of the data for a socket is ready, select returns the socket-readable, and we can call Recvfrom to process the data. Because blocking I/O can only block one I/O operation, and the I/O multiplexing model can block multiple I/O operations, it is called multiplexing.

4 briefly describe the differences between process, thread, and association, and the application scenario?

Threads and processes:

Threads are part of a process, threads run in process space, threads generated by the same process share the same memory space, and threads generated by the process are forced to exit and clear when the process exits. Threads can share all the resources owned by a process with other threads that belong to the same process, but they do not inherently have system resources and have only a bit of information that is essential in the run (such as program counters, a set of registers, and stacks).

threads, processes, and processes: the operation of the thread and process is triggered by the program's interface, and the last performer is the system;

The significance of the existence of the process: for multi-threaded applications, the CPU through slicing to switch between threads of execution, thread switching takes time (hold state, next continue). , only one thread is used, and a code block execution order is specified in one thread. Application scenario: When there are a large number of operations in the program that do not require the CPU (IO), it is suitable for the association process;

What the hell is a 5 Gil lock?

What we call a Python global Interpretation lock (GIL) is simply a mutex (or lock) that allows only one thread to control the Python interpreter.

This means that only one thread is in the execution state at any one point in time. The Gil has no significant impact on the programs that perform single-threaded tasks, but it becomes a performance bottleneck for compute-intensive (cpu-bound) and multithreaded tasks.

Since Gil is allowed to run only one thread at a time, even in multi-threaded frames with multiple CPU cores, its reputation as "notorious" in Python's many features.

In this article, you will learn how the Gil affects the performance of your Python program and how it can mitigate its impact on your code.

Gil solves what's wrong with Python? Python uses reference counting for memory management, which means that objects created in Python have a reference count variable to track the number of references to that object. When the quantity is 0 o'clock, the memory occupied by the object is freed.

Let's show how the reference count works with a simple code:

In the above example, the reference count for an empty list object [] is 3. The list object is referenced by a, B, and a parameter passed to Sys.getrefcount ().

Back to the Gil itself:

The problem is that this reference counting variable needs to be protected from the race condition when two threads increase or decrease at the same time. If this happens, it can cause the leaked memory to never be freed, or, more seriously, to free memory incorrectly when a reference to an object still exists. This can cause a Python program to crash or introduce a variety of weird bugs.

By adding locks to data structures that are shared across threads so that they are not inconsistently modified, it is good to keep reference count variables safe.

However, adding a lock to each object or group of objects means that there are multiple locks, which also leads to another problem-the deadlock (which occurs only when there are multiple locks). Another side effect is the performance degradation caused by repeated acquisition and release of Locks.

The Gil is a single lock on the interpreter itself, and the addition of a rule indicates that any Python bytecode execution requires an interpretation lock. This effectively prevents deadlocks (because there is only one lock) and does not result in too much performance overhead. But it does make every compute-intensive task a single thread.

Gil is also used by other language interpreters (such as Ruby), but this is not the only way to solve this problem. Some programming languages avoid the Gil's request for thread-safe memory management by using methods other than reference counting, such as garbage collection.

On the other hand, this also means that these languages often need to add other performance-enhancing features, such as the JIT compiler, to compensate for the Gil single-threaded performance advantage.

Why choose Gil as a solution? So why use this seemingly stumbling block technique in python? Is this a bad decision for a python developer?

As Larry Hasting says, the Gil design decision is one of the key reasons Python is now being hotly sought after.

Python always exists when the operating system does not yet have the concept of threading. The python design was designed to be easy to use for faster development, which also allowed more and more programmers to start using Python.

Many extensions have been written to the C library for those functions required by Python, and in order to prevent inconsistencies, these C extensions require thread-safe memory management, which Gil provides.

Gil is very easy to implement and easy to add to Python. Because you only need to manage one lock, there is a performance boost for single-threaded tasks.

Non-thread-safe C libraries became easier to integrate, and these C extensions became one of the reasons Python was accepted by different communities.

As you can see, Gil is a practical solution for CPython developers to face difficult problems in their early Python career.

Effects on multi-threaded Python programs when you look at some typical Python programs or any computer program, you'll find that the performance of a program for compute-intensive and I/o-intensive tasks varies.

Compute-intensive tasks are those that push the CPU to the limit. This includes the process of mathematical calculation, such as matrix multiplication, search, image processing and so on.

I/O intensive tasks are tasks that take time to wait for input and output from users, files, databases, networks, and so on. I/O intensive tasks sometimes wait very long until they get the content they need from the data source. This is because the data source itself needs to be processed first before it is ready for input and output. For example, a user considers a database query that is entered in the input prompt or runs in its own process.

Let's take a look at a simple computationally intensive program that performs a countdown:

Run on my 4-core system to get the following output:

Next I make a fine-tuning of the code, using two threads of parallel processing to complete the countdown:

Then I run it again:

As you can see, the two versions have a similar finish time. In a multithreaded version, the Gil blocks the compute-intensive task threads from executing concurrently.

The Gil has little impact on the performance of I/O intensive task multithreaded programming because locks can be shared across multiple threads while waiting for I/O.

However, for a thread that is fully computationally intensive (for example, using a thread for partial image processing), it will not only become a single-threaded task due to a lock, but also significantly increase execution time. As in the previous example, the multi-threaded result is compared to a full single thread.

This increase in execution time is due to the acquisition and release overhead associated with the lock.

How do I deal with the Gil in Python? If Gil bothers you, you can try the method:

Multi-Process vs Multithreading: The most popular approach is to apply a multi-process approach in which you use multiple processes instead of multiple threads. Each Python process has its own Python interpreter and memory space, so the Gil does not become a problem. Python has a multiprocessing module that can help us easily create multiple processes:

Run on the system to get

Performance has improved compared to multi-threaded versions.

But time does not fall to half of our previous version, because process management has its own overhead. Multi-process is more "heavy" than multithreading, so keep in mind that this can be a bottleneck for scale.

Alternative Python interpreter: There are several interpreter implementations in Python, and the Cpython,jpython,ironpython and pypy written in c,java,c# and Python are the most popular. Gil exists only in the traditional Python implementation method such as CPython. If your program and its library files can be implemented in other ways, you can also try it.

Wait and see: Many users use Gil to improve the performance of single-threaded tasks. Of course, multithreaded programmers don't have to worry about it, because some of the brains inside the Python community are working to remove Gil from CPython. One of the attempts is giletomy.

The Python Gil is often considered a mysterious and difficult topic. But remember, as a Python supporter, the Gil will only be affected if you are writing a C extension or if you have computationally intensive multithreaded tasks in your program.

In this case, this article should give you everything you need to know what Gil is and how to handle it in your own project. If you want to understand the low-level internal operation of the GIL, I suggest you watch David Beazley's understanding the Python Gil.

Identify the QR code in the image and collect the full Python video

Python face question (vii)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More