Talking about python asynchronous IO, synchronous IO, threads and processes ~, Pythonio

Last Update:2018-03-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

[The code in the article is not indented. When I first started using the blog, it will be optimized later ~]

The threads in the main thread can communicate with each other, but the sub-processes in the parent process cannot actively communicate with each other. However, the sub-process can also achieve communication, you can use the discount method, such as multiprocessing. the usage of Queue is basically the same as that of the queue in the thread. The example is as follows:
Import threading
From multiprocessing import Process, Queue
Import time
Def thre (qq ):
Qq. put ([1, 'xixi', 2])

If _ name _ = '_ main __':
Q = Queue ()
P = Process (target = thre, args = (q,) # In a Process, because the Process memory is independent, it cannot be called each other. parameters must be input, this q actually copies a Queue instance. If the parameter is not passed as the thread, an error is returned ** not find... Because the memory is not shared.
# P = threading. thread (target = thre) # In the Thread, the thre function can be called without passing parameters, because they operate under the same memory address [change to def thre ():], of course, it is okay to pass parameters.
P. start ()
Print (q. get ())

# If you want to have a connection between processes and you cannot contact them on your own initiative, this is a tough injury, just like qq and word, but if you want them to have a connection, it is to copy the text from word to qq (or copy the image text to the word in qq), which seems to have a connection between the two. In fact, it just clones the relationship between the text, however, it seems that there is a connection, so the communication between processes in python can be considered through Queue. The internal operations of Queue are actually achieved through the pickle function to transmit parameters and other connections.
Another is pipe, which is passed through pipelines. It is also an instantiated object for creating pipe.
From multiprocessing import Process, Pipe

Def f (conn ):
Conn. send ('balabala ')
Print (conn. recv ())

If _ name __= = '_ main __':
Parent_conn, child_conn = Pipe () # two values are returned after instantiation. One is the parent joint and the other is the Child joint because it is a Pipe.
P = Process (target = f, args = (child_conn ,))
P. start ()
Print (parent_conn.recv ())
Parent_conn.send ('babababa ')

In this way, data can be transmitted, but not shared.
To achieve sharing between processes, you need to use manager.
From multiprocessing import Process, Manager
Import time, OS

Def thre (dd, ll ):
Dd [OS. getpid ()] = OS. getppid ()
Ll. append (OS. getpid ())
Print (ll)
Print (dd)

If _ name _ = '_ main __':
Manager = Manager ()
D = manager. dict ()
L = manager. list (range (3 ))
T_list = []
For I in range (10 ):
P = Process (target = thre, args = (d, l ))
P. start ()
T_list.append (p)
For res in t_list:
Res. join ()
In this case, the dictionary d and list l can be overwritten by process modifications at the same time, but here I use OS. the data obtained by getpid () is inconsistent. If the data is consistent, the final dictionary has only one k-v, and the list contains 10 identical data.

Process locks exist to output the same screen... That's all.

The role of the Process pool is similar to the semaphore in the thread. Several processes can run simultaneously at the same time.
Here are apply and apply_async, one is a serial operation, and the other is a parallel operation.
From multiprocessing import Pool
Import time, OS

Def thre (dd ):
Time. sleep (1)
Print ('the process: ', OS. getpid ())
Return dd + 100
Def g (c ):
Print ('hahaha', c, OS. getpid ())

# Start_time = time. time ()
# L = []
If _ name _ = '_ main __':
P _ = Pool (3) # The number of processes allowed to run simultaneously is 3.
Print (OS. getpid ())
For I in range (10 ):
P _. apply_async (func = thre, args = (I,), callback = g) [callback is the callback function, and the passed parameter is the return value of thre]
P _. close ()
P _. join () # If join is not added here, it will be directly closed in parallel, and the program will be closed directly. If join is added, the main process will be closed after the child process ends, this is only useful in parallel, and serial has no effect. Be sure to close before join

Coroutine: it can achieve high concurrency. In essence, it is a single thread. A cpu supports tens of thousands of coroutine concurrency.
Gevent (automatically triggered) and greenlet (manually triggered)
Import gevent

Def fun1 ():
Print ('runing 1 ...')
Gevent. sleep (2) # simulate io
Print ('running 2 ...')

Def fun2 ():
Print ('running 3 ...')
Gevent. sleep (3)
Print ('running 4 ')

Def fun3 ():
Print ('running 5 ...')
Gevent. sleep (0)
Print ('end? ')

Gevent. joinall ([gevent. spawn (fun1), gevent. spawn (fun2), gevent. spawn (fun3)])
Running result:
Runing 1...
Running 3...
Running 5...
End?
Running 2...
Running 4
----------------------
Sleep is equivalent to a trigger button. When a sleep occurs, you can find the content printing operation in the next function. The sleep duration is equivalent to several times, and sleep (3) is equivalent to 3 seconds, if the other statements are not stuck, execute the statement immediately. Coroutine works well in multiple concurrent crawlers.
Import gevent, time
Import urllib. request as ul
From gevent import monkey
Monkey. patch_all () # This identifier indicates that all programs are directly switched as io, because gevent does not recognize that urllib has io operations, which is equivalent to serial operations.
Def f (url ):
Print ('get % s' % url)
Res = ul. urlopen (url). read ()
Print ('recv bytes % s from % s' % (len (res), url ))

Time_start = time. time ()
L = ['https: // response
For I in l:
F (I)
Print ('synchronization time: ', time. time ()-time_start)
Async_time = time. time ()
Gevent. joinall ([gevent. spawn (f, 'https: // www.python.org /'),
Gevent. spawn (f, 'HTTP: // km.58.com /'),
Gevent. spawn (f, 'HTTP: // kan.sogou.com/dongman /'),
Gevent. spawn (f, 'HTTP: // news.sohu.com/')])
Print ('asynchronous time: ', time. time ()-async_time)

Running result:
GET https://www.python.org/
Recv bytes 48860 from https://www.python.org/
GET http://km.58.com/
Recv bytes 104670 from http://km.58.com/
GET http://kan.sogou.com/dongman/
Recv bytes 12713 from http://kan.sogou.com/dongman/
GET http://news.sohu.com/
Recv bytes 170935 from http://news.sohu.com/
Synchronization time: 3.780085563659668
GET https://www.python.org/
GET http://km.58.com/
GET http://kan.sogou.com/dongman/
GET http://news.sohu.com/
Recv bytes 12690 from http://kan.sogou.com/dongman/
Recv bytes 170935 from http://news.sohu.com/
Recv bytes 104670 from http://km.58.com/
Recv bytes 48860 from https://www.python.org/
Asynchronous time: 2.5934762954711914

User space and kernel space)
Currently, virtual memory is used in the operating system. The core of the operating system is the kernel, which is independent of common applications. It can access protected memory space and hardware devices, to ensure that user processes cannot directly operate the kernel and ensure kernel security, the operating system divides virtual space into two parts: kernel space and user space.

Process Switching
In order to control the execution of processes, the kernel must be able to suspend the processes running on the CPU and resume the execution of a previously suspended process. This behavior is called process switching. Therefore, any process runs with the support of the operating system kernel and is closely linked to the kernel.
Switching from running a process to running another process is actually switching from saving the context. Next time, start from the previously saved location.

Process blocking:
The formal execution of the process is not expected to happen because of a specific event, such as a request to wait for system resource failure, waiting for the completion of a certain operation, new data has not yet reached or no new work has started, etc, the system automatically executes the blocking primitive to change itself from the original running status to the blocking status to pause waiting ()
. It can be seen that process blocking is an active action of the process itself. Therefore, only processes in the running state (obtain the CPU) can be converted to blocking state, it does not consume CPU resources when the process is blocked.

Cache I/O
It is also a standard IO, and most File System Default I/O operations are cache I/O, in the Linux cache I/O mechanism, the operating system will cache the I/O data in the page cache of the file system, that is, the file data will be copied to the buffer zone of the system kernel, then, copy the buffer from the system kernel to the user's process memory, that is, the address space of the application. The disadvantage is that the data will be copied repeatedly in the address space and kernel space of the user process application. In this case, the CPU and memory costs are high.

I/O mode
Synchronous IO and asynchronous IO:
Synchronous IO includes: blocking I/O, non-blocking I/O, and multiplexing I/O (I/O multiplexing) signal driving (not commonly used in practice. No notes are recorded here)
Asynchronous I/O (asynchronous I/O)

Blocking IO: initiate a request and wait for data preparation (at this time, the process is blocked and waits) until the data is ready to be accepted, and copy to the user process in the kernel space. At this time, the wait is blocked again, until all the data is sent to the user process (client ).

Non-blocking I/O: after a request is initiated, verification is sent frantically. when the data is not prepared, the block is not blocked. Instead, an error is returned to the user process, which verifies whether the user process has an error, yes, it will continue to send the request and verify it back and forth (because the process is not blocked at this time, you can do other things) not just start to copy data in the kernel space, but it is still blocked at this time, if the data volume is small, the data volume will quickly become larger. Finally, the user receives the complete data.

Multiplexing I/O: initiate several hundred request links at a time. No matter which link has a data reply, the user process is notified to start receiving data, at this time, the links start kernel copy again until the process receives the complete data (in fact, this is also blocked ). The core of this mode is to use non-blocking I/O methods to drive it. Therefore, the formation of multiplexing is already highly concurrent.

Asynchronous I/O: This is awesome. When he initiates a request, he receives a reply on the spot, 'Do your other thing'. At this time, the process starts to run in other parts without any blocking, when the data is received, the kernel copy is started directly in the background. After all the data is completed, the system sends the data directly to the door of the house, and sends a signal to the user. The user process accepts the data as soon as it is successful, at this time, the entire process is not blocked at all! This is asynchronous IO.

Selectors
Selectors contains select, poll, epoll, and detailed instances:
Import selectors, socket
Sel = selectors. DefaultSelector ()

Def accept (sock, mask ):
Conn, addr = sock. accept ()
Conn. setblocking (False)
Sel. register (conn, selectors. EVENT_READ, read)

Def read (conn, mask ):
Data = conn. recv (1024). decode ()
If data:
Conn. send ('haha + % s' % data). encode ())
Else:
Print ('What? ', Conn)
Sel. unregister (conn)
Conn. close ()

Sock = socket. socket ()
Sock. bind ('localhost', 5000 ))
Sock. listen (1000)
Sel. register (sock, selectors. EVENT_READ, accept)
While True:
Events = sel. select ()
For key, mask in events:
Callback = key. data
Callback (key. fileobj, mask)
Multiple concurrent operations can be performed.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Talking about python asynchronous IO, synchronous IO, threads and processes ~, Pythonio

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Talking about python asynchronous IO, synchronous IO, threads and processes ~, Pythonio

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support