Precautions for the mutilprocessingProcessing parent-child process to share file objects in Python

Source: Internet
Author: User
Multiprocessingpython multi-process module, so Processing is also the darling of multi-process. however, the questions discussed today seem to attract some attention from the code: {code ...} the above code intent is clear: through multiprocessing. process derives a Process ,... multiprocessing python multi-process module, so Processing is also the darling of multi-process. however, the questions discussed today seem to attract us some attention.

Directly run the code:

from multiprocessing import Process, Lockerr_file = 'error1.log'  err_fd = open(err_file, 'w')def put(fd):     print "PUT"     fd.write("hello, func put write\n")     print "END"if __name__=='__main__':    p_list=[]    for i in range(1):        p_list.append(Process(target=put, args=(err_fd,)))        for p in p_list:        p.start()    for p in p_list:        p.join()

The above code intent is clear: through multiprocessing. process is used to derive a Process and execute the put function. the put function is also very clear about its role. it outputs PUT and END and writes "hello, func put write" to the error1.log file.

In theory, the output should be PUT and END, as mentioned above, and then error1.log will have the sentence "hello, func put write". However, there is always something unpredictable in the world, the code execution result is:

[root@iZ23pynfq19Z ~]# py27 2.py ; cat error1.log PUTEND[root@iZ23pynfq19Z ~]#

What !? Why does error1.log have nothing !?

Let's adjust the code a little and witness the magic:

From multiprocessing import Process, Lockerr_file = 'error1. log 'err_fd = open (err_file, 'w') def put (fd): print "PUT" fd. write ("hello, func put write \ n") fd. write ("o" * 4075) # magic print "END" if _ name __= = '_ main __': p_list = [] for I in range (1): p_list.append (Process (target = put, args = (err_fd,) for p in p_list: p. start () for p in p_list: p. join ()

Output result:

[Root @ iZ23pynfq19Z ~] # Py27 2.py; cat error1.log PUTENDhello, func put writeo... (4075) [root @ iZ23pynfq19Z ~] #

Is there a feeling of being awesome !?

Now, two problems are emerging:

  1. Why can't the first program write that sentence, but the second one?

  2. What is that 4075?
    Before explaining these issues, we need to understand the features of the standard IO Library:Full Buffer, row buffer, no buffer

Specific can see before Blog: https://my.oschina.net/u/2291...

Because the file is currently written, the system IO will adopt the full buffer mode, that is, the buffer will be filled before it is flushed into the system write queue.

So the above problem was solved all at once. just because those general 'O' filled the entire buffer zone, the system brushed our content into the write queue, therefore, how to use 4075 is to use 4096-sizeof ("hello, func put writen") + 1. Why do we need to add 1? because the buffer is full, it is not enough to trigger the write operation if it is greater.

So now we can get the answer. if we want to write files in multiprcessing. Process in a method similar to the following, there are three methods to achieve this:

  • Buffer full

  • Manually call flush ()

  • Set the object to not buffer
    The first and second types have been described above, so let's briefly talk about the third type:

From Python official website Document: open (name [, mode [, buffering])... the optional buffering argument specifies the file's desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes ). A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files. if omitted, the system default is used. [2]

This means that when we open the database, we can set buffering to 0, which is the unbuffered mode. in each write, the data is directly written to the write queue rather than the buffer. (The method with the lowest performance)

------------------------------------------------ I am a cutting line ----------------------------------------------

After talking about the phenomenon and the solution, we should look into it;

I believe we have tried to find that the file can still be written normally when the file object is not displayed or flush is called. what is the problem?

In fact, when we close the program normally, the process will do some "tail" for us when exiting, such as closing open file descriptors, clearing temporary files, and clearing memory. because of the "good habits" of the system, when the file descriptor is closed, our data can be flushed into the write queue, and the file content will not be lost.

Based on this understanding, let's look back at the problem just now. when the sub-process calls put, theoretically, when the program exits, the file descriptor is not closed, so the data is lost in the buffer zone.

Let's look at the implementation of Process.

multiprocessing/Processing.py    def start(self):        '''        Start child process        '''        assert self._popen is None, 'cannot start a process twice'        assert self._parent_pid == os.getpid(), \               'can only start a process object created by current process'        assert not _current_process._daemonic, \               'daemonic processes are not allowed to have children'        _cleanup()        if self._Popen is not None:            Popen = self._Popen        else:            from .forking import Popen        self._popen = Popen(self)        _current_process._children.add(self)

Let's look at how Popn works?

multiprocessing/forking.py    class Popen(object):        def __init__(self, process_obj):            sys.stdout.flush()            sys.stderr.flush()            self.returncode = None            self.pid = os.fork()            if self.pid == 0:                if 'random' in sys.modules:                    import random                    random.seed()                code = process_obj._bootstrap()                sys.stdout.flush()                sys.stderr.flush()                os._exit(code)

The key point is the last OS. _ exit (code). why is it the most important? The exit of this part determines what the process will handle ",

What is OS. _ exit? It is actually the standard library's _ eixt, so we can simply learn this thing.

Https://my.oschina.net/u/2291...

In the above link, we can clearly see that _ exit () and exit () are two different things. _ exit () is simple and violent and directly discards user content, enter the kernel, while exit () is patiently cleaned up for us.

Can we assume that the exit of Popen is not OS. _ exit?

Fortunately, sys. exit () is the first exit (). please try again later!

multiprocessing/forking.py    class Popen(object):        def __init__(self, process_obj):            sys.stdout.flush()            sys.stderr.flush()            self.returncode = None            self.pid = os.fork()            if self.pid == 0:                if 'random' in sys.modules:                    import random                    random.seed()                code = process_obj._bootstrap()                sys.stdout.flush()                sys.stderr.flush()                #os._exit(code)                sys.exit(code)

Test Code, return the original version without 'O' filling

[root@iZ23pynfq19Z ~]# python 2.py ; cat error1.log PUTENDhello, func put write

As we can see, it can be written in, which proves that the above statement can survive.

However, it is better not to modify the source code without any need. after all, these are the results optimized by the elders for many years. they may have intentionally written these statements to avoid some problems. better regulate your behaviors, and try to reduce the implementation ideas that seem less standard.
You are welcome to give advice to exchange, reprint please note: https://segmentfault.com/a/11...


Multiprocessing python multi-process module, so Processing is also the darling of multi-process. However, the problems discussed today seem to attract us some attention.

Directly run the code:

from multiprocessing import Process, Lockerr_file = 'error1.log'  err_fd = open(err_file, 'w')def put(fd):     print "PUT"     fd.write("hello, func put write\n")     print "END"if __name__=='__main__':    p_list=[]    for i in range(1):        p_list.append(Process(target=put, args=(err_fd,)))        for p in p_list:        p.start()    for p in p_list:        p.join()

The above code intent is clear: through multiprocessing. process is used to derive a Process and execute the put function. the put function is also very clear about its role. it outputs PUT and END and writes "hello, func put write" to the error1.log file.

In theory, the output should be PUT and END, as mentioned above, and then error1.log will have the sentence "hello, func put write". However, there is always something unpredictable in the world, the code execution result is:

[root@iZ23pynfq19Z ~]# py27 2.py ; cat error1.log PUTEND[root@iZ23pynfq19Z ~]#

What !? Why does error1.log have nothing !?

Let's adjust the code a little and witness the magic:

From multiprocessing import Process, Lockerr_file = 'error1. log 'err_fd = open (err_file, 'w') def put (fd): print "PUT" fd. write ("hello, func put write \ n") fd. write ("o" * 4075) # magic print "END" if _ name __= = '_ main __': p_list = [] for I in range (1): p_list.append (Process (target = put, args = (err_fd,) for p in p_list: p. start () for p in p_list: p. join ()

Output result:

[Root @ iZ23pynfq19Z ~] # Py27 2.py; cat error1.log PUTENDhello, func put writeo... (4075) [root @ iZ23pynfq19Z ~] #

Is there a feeling of being awesome !?

Now, two problems are emerging:

  1. Why can't the first program write that sentence, but the second one?

  2. What is that 4075?
    Before explaining these issues, we need to understand the features of the standard IO Library:Full Buffer, row buffer, no buffer

Specific can see before Blog: https://my.oschina.net/u/2291...

Because the file is currently written, the system IO will adopt the full buffer mode, that is, the buffer will be filled before it is flushed into the system write queue.

So the above problem was solved all at once. just because those general 'O' filled the entire buffer zone, the system brushed our content into the write queue, therefore, how to use 4075 is to use 4096-sizeof ("hello, func put writen") + 1. Why do we need to add 1? because the buffer is full, it is not enough to trigger the write operation if it is greater.

So now we can get the answer. if we want to write files in multiprcessing. Process in a method similar to the following, there are three methods to achieve this:

  • Buffer full

  • Manually call flush ()

  • Set the object to not buffer
    The first and second types have been described above, so let's briefly talk about the third type:

From Python official website Document: open (name [, mode [, buffering])... the optional buffering argument specifies the file's desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes ). A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files. if omitted, the system default is used. [2]

This means that when we open the database, we can set buffering to 0, which is the unbuffered mode. in each write, the data is directly written to the write queue rather than the buffer. (The method with the lowest performance)

------------------------------------------------ I am a cutting line ----------------------------------------------

After talking about the phenomenon and the solution, we should look into it;

I believe we have tried to find that the file can still be written normally when the file object is not displayed or flush is called. what is the problem?

In fact, when we close the program normally, the process will do some "tail" for us when exiting, such as closing open file descriptors, clearing temporary files, and clearing memory. because of the "good habits" of the system, when the file descriptor is closed, our data can be flushed into the write queue, and the file content will not be lost.

Based on this understanding, let's look back at the problem just now. when the sub-process calls put, theoretically, when the program exits, the file descriptor is not closed, so the data is lost in the buffer zone.

Let's look at the implementation of Process.

multiprocessing/Processing.py    def start(self):        '''        Start child process        '''        assert self._popen is None, 'cannot start a process twice'        assert self._parent_pid == os.getpid(), \               'can only start a process object created by current process'        assert not _current_process._daemonic, \               'daemonic processes are not allowed to have children'        _cleanup()        if self._Popen is not None:            Popen = self._Popen        else:            from .forking import Popen        self._popen = Popen(self)        _current_process._children.add(self)

Let's look at how Popn works?

multiprocessing/forking.py    class Popen(object):        def __init__(self, process_obj):            sys.stdout.flush()            sys.stderr.flush()            self.returncode = None            self.pid = os.fork()            if self.pid == 0:                if 'random' in sys.modules:                    import random                    random.seed()                code = process_obj._bootstrap()                sys.stdout.flush()                sys.stderr.flush()                os._exit(code)

The key point is the last OS. _ exit (code). why is it the most important? The exit of this part determines what the process will handle ",

What is OS. _ exit? It is actually the standard library's _ eixt, so we can simply learn this thing.

Https://my.oschina.net/u/2291...

In the above link, we can clearly see that _ exit () and exit () are two different things. _ exit () is simple and violent and directly discards user content, enter the kernel, while exit () is patiently cleaned up for us.

Can we assume that the exit of Popen is not OS. _ exit?

Fortunately, sys. exit () is the first exit (). please try again later!

multiprocessing/forking.py    class Popen(object):        def __init__(self, process_obj):            sys.stdout.flush()            sys.stderr.flush()            self.returncode = None            self.pid = os.fork()            if self.pid == 0:                if 'random' in sys.modules:                    import random                    random.seed()                code = process_obj._bootstrap()                sys.stdout.flush()                sys.stderr.flush()                #os._exit(code)                sys.exit(code)

Test Code, return the original version without 'O' filling

[root@iZ23pynfq19Z ~]# python 2.py ; cat error1.log PUTENDhello, func put write

As we can see, it can be written in, which proves that the above statement can survive.

However, it is better not to modify the source code without any need. after all, these are the results optimized by the elders for many years. they may have intentionally written these statements to avoid some problems. better regulate your behaviors, and try to reduce the implementation ideas that seem less standard.

The above is a detailed description of precautions for the mutilprocessing Processing parent-child process sharing file objects in Python. For more information, see other related articles in the first PHP community!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.