mutilprocessing Processing parent-child process sharing file objects in Python considerations

Source: Internet
Author: User
Multiprocessing Python multi-process module, so, processing is also a multi-process darling. But the issues that we are discussing today seem to give us a great deal of attention.

Directly on the code:

From multiprocessing import Process, lockerr_file = ' Error1.log '  err_fd = open (Err_file, ' W ') def put (FD):     print " Put "     fd.write (" Hello, Func put write\n ")     print" END "if __name__== ' __main__ ':    p_list=[] for    i in range ( 1):        p_list.append (Process (Target=put, args= (ERR_FD,)))) for        p in p_list:        P.start () for    p in P_list:        P.join ()

The code above is intended to be clear: through multiprocessing. Processes derive a process to execute the PUT function, the Put function is also very clear, output put and end, and writes "Hello, Func put write" to the file Error1.log.

Then it should be said that the output should be as stated above, put and end, and then Error1.log will have that sentence "Hello, func put write", however, the world is always a bit difficult, the code execution results are:

[root@iz23pynfq19z ~]# py27 2.py; Cat Error1.log putend[root@iz23pynfq19z ~]#

What!? Why Error1.log have nothing!?

Let's tweak the code a little bit, and then witness the magical thing:

From multiprocessing import Process, lockerr_file = ' Error1.log '  err_fd = open (Err_file, ' W ') def put (FD):     print " Put "     fd.write (" Hello, Func put write\n ")     fd.write (" o "* 4075) # A magical line of     print" END "if __name__== ' __main__ ': C5/>p_list=[] for    i in range (1):        p_list.append (Process (Target=put, args= (ERR_FD,))) for    p in p_list:        P.start () for    p in p_list:        p.join ()

Output Result:

[root@iz23pynfq19z ~]# py27 2.py; Cat Error1.log Putendhello, func put Writeo .... (There are 4,075 of them) [root@iz23pynfq19z ~]#

Have you ever felt a kind of confused feeling!?

Now, there are two questions emerging in mind:

    1. Why is the first program unable to write that sentence, but the second one is okay?

    2. What the hell is that 4075?
      Before explaining these issues, we need to be clear about the features of the standard IO Library: full buffering, row buffering, non-buffering

Specifically, you can read the previous blog post: https://my.oschina.net/u/2291 ...

Because the file is now written, System IO will be fully buffered, that is, the buffer will be filled before it is brushed into the system write queue.

So the above problem is solved all at once, just because those crazy ' o ', fills the whole buffer, so the system will brush our content into the queue, so 4075 how to come, is to use 4096-sizeof ("Hello, Func put writen") +1, why to + 1, because the buffer is not full enough to be larger than to trigger write action.

So we are now able to come to an answer if we want to be in multiprcessing. In process, there are three ways to do this when writing a file in a similar way:

    • Write full buffer

    • Manually Call Flush ()

    • Set the file object to not buffer
      The first and second kind is described above, so let's briefly talk about the third kind:

From Python official website Document:open (name[, mode[, buffering]) ...  The optional buffering argument specifies the file ' s desired buffer size:0 means unbuffered,   1 means line buffered, a NY Other positive value means use a buffer of (approximately) the   size (in bytes). A negative buffering means to use the system default, which are usually line   buffered for TTY devices and fully buffere D for other files. If omitted, the system default is   used. [2]

The explanation is that allowing us to set buffering to 0 at open, then the unbuffered mode, then write directly to the write queue instead of writing to the buffer. (The least-performing way)

------------------------------------------------I'm a cutting line----------------------------------------------

After talking about the phenomenon and the way of dealing with it, we should take a bit in-depth;

I believe we have tried, when the file object is not shown or display call flush, the files can still write, then what is the matter?

In fact, when we normally close the program, the process exits will do some "hand" for us, such as close open file descriptor, clean up temporary files, clean up memory and so on. It is because of this "good habit" of the system, so our data can be brushed into the write queue when the file descriptor is closed, and the file contents will not be lost

So based on this understanding, we look back to the problem, when the child process calls put, in theory, when the program exits, does not show the closing file descriptor, so the data in the buffer is lost.

Let's take a look at the implementation of the process

multiprocessing/processing.py    def start (self):        "Start child        Process" "        assert Self._popen  Is None, ' cannot start a process twice '        assert self._parent_pid = = Os.getpid (), \               ' can only start a process object Created by current process '        assert not _current_process._daemonic, \               ' daemonic processes is not allowed to has Children '        _cleanup ()        if Self._popen is not None:            Popen = Self._popen        else: from            . forking Import Popen        Self._popen = Popen (self)        _current_process._children.add (self)

What do you think of POPN?

multiprocessing/forking.py    class Popen (object):        def __init__ (self, process_obj):            Sys.stdout.flush ()            Sys.stderr.flush ()            self.returncode = None            self.pid = os.fork ()            if self.pid = = 0:                if ' random ' in sys . Modules:                    import random                    random.seed ()                code = process_obj._bootstrap ()                Sys.stdout.flush ()                Sys.stderr.flush ()                os._exit (code)

The key place is the last Os._exit (code), why say the most important? Because this part of the exit will determine what the process will be dealing with "hand tail",

What the hell is os._exit? In fact, is the standard library of _eixt, so we can easily learn this thing

https://my.oschina.net/u/2291 ...

In the above link, we can clearly see that _exit () and exit () is a comparison of two things, _exit () simple violence, directly discard user-state content, into the kernel, and exit () is more patient to clean up for us

Can we assume, then, if Popen's exit is not Os._exit ()?

Luckily, Sys.exit () is our first exit (), without further ado, try it!

multiprocessing/forking.py    class Popen (object):        def __init__ (self, process_obj):            Sys.stdout.flush ()            Sys.stderr.flush ()            self.returncode = None            self.pid = os.fork ()            if self.pid = = 0:                if ' random ' in sys . Modules:                    import random                    random.seed ()                code = process_obj._bootstrap ()                Sys.stdout.flush ()                Sys.stderr.flush ()                #os. _exit (code)                Sys.exit (code)

Test the code to return the original version without the ' O ' Fill

[root@iz23pynfq19z ~]# python 2.py; Cat Error1.log Putendhello, Func put write

We can see that it can be written in, which proves that the above statement is standing in the footsteps of

However, it is best not to change the source code Oh, after all, these are old-timer optimization results, perhaps this is they deliberately write these, in order to avoid certain problems. Or standardize your behavior and try to minimize these seemingly nonstandard implementations.
Welcome to the Great God of communication, reproduced please specify: HTTPS://SEGMENTFAULT.COM/A/11 ...


Multiprocessing Python multi-process module, so, processing is also a multi-process darling. But the issues that we are discussing today seem to give us a great deal of attention.

Directly on the code:

From multiprocessing import Process, lockerr_file = ' Error1.log '  err_fd = open (Err_file, ' W ') def put (FD):     print " Put "     fd.write (" Hello, Func put write\n ")     print" END "if __name__== ' __main__ ':    p_list=[] for    i in range ( 1):        p_list.append (Process (Target=put, args= (ERR_FD,)))) for        p in p_list:        P.start () for    p in P_list:        P.join ()

The code above is intended to be clear: through multiprocessing. Processes derive a process to execute the PUT function, the Put function is also very clear, output put and end, and writes "Hello, Func put write" to the file Error1.log.

Then it should be said that the output should be as stated above, put and end, and then Error1.log will have that sentence "Hello, func put write", however, the world is always a bit difficult, the code execution results are:

[root@iz23pynfq19z ~]# py27 2.py; Cat Error1.log putend[root@iz23pynfq19z ~]#

What!? Why Error1.log have nothing!?

Let's tweak the code a little bit, and then witness the magical thing:

From multiprocessing import Process, lockerr_file = ' Error1.log '  err_fd = open (Err_file, ' W ') def put (FD):     print " Put "     fd.write (" Hello, Func put write\n ")     fd.write (" o "* 4075) # A magical line of     print" END "if __name__== ' __main__ ': C5/>p_list=[] for    i in range (1):        p_list.append (Process (Target=put, args= (ERR_FD,))) for    p in p_list:        P.start () for    p in p_list:        p.join ()

Output Result:

[root@iz23pynfq19z ~]# py27 2.py; Cat Error1.log Putendhello, func put Writeo .... (There are 4,075 of them) [root@iz23pynfq19z ~]#

Have you ever felt a kind of confused feeling!?

Now, there are two questions emerging in mind:

    1. Why is the first program unable to write that sentence, but the second one is okay?

    2. What the hell is that 4075?
      Before explaining these issues, we need to be clear about the features of the standard IO Library: full buffering, row buffering, non-buffering

Specifically, you can read the previous blog post: https://my.oschina.net/u/2291 ...

Because the file is now written, System IO will be fully buffered, that is, the buffer will be filled before it is brushed into the system write queue.

So the above problem is solved all at once, just because those crazy ' o ', fills the whole buffer, so the system will brush our content into the queue, so 4075 how to come, is to use 4096-sizeof ("Hello, Func put writen") +1, why to + 1, because the buffer is not full enough to be larger than to trigger write action.

So we are now able to come to an answer if we want to be in multiprcessing. In process, there are three ways to do this when writing a file in a similar way:

    • Write full buffer

    • Manually Call Flush ()

    • Set the file object to not buffer
      The first and second kind is described above, so let's briefly talk about the third kind:

From Python official website Document:open (name[, mode[, buffering]) ...  The optional buffering argument specifies the file ' s desired buffer size:0 means unbuffered,   1 means line buffered, a NY Other positive value means use a buffer of (approximately) the   size (in bytes). A negative buffering means to use the system default, which are usually line   buffered for TTY devices and fully buffere D for other files. If omitted, the system default is   used. [2]

The explanation is that allowing us to set buffering to 0 at open, then the unbuffered mode, then write directly to the write queue instead of writing to the buffer. (The least-performing way)

------------------------------------------------I'm a cutting line----------------------------------------------

After talking about the phenomenon and the way of dealing with it, we should take a bit in-depth;

I believe we have tried, when the file object is not shown or display call flush, the files can still write, then what is the matter?

In fact, when we normally close the program, the process exits will do some "hand" for us, such as close open file descriptor, clean up temporary files, clean up memory and so on. It is because of this "good habit" of the system, so our data can be brushed into the write queue when the file descriptor is closed, and the file contents will not be lost

So based on this understanding, we look back to the problem, when the child process calls put, in theory, when the program exits, does not show the closing file descriptor, so the data in the buffer is lost.

Let's take a look at the implementation of the process

multiprocessing/processing.py    def start (self):        "Start child        Process" "        assert Self._popen  Is None, ' cannot start a process twice '        assert self._parent_pid = = Os.getpid (), \               ' can only start a process object Created by current process '        assert not _current_process._daemonic, \               ' daemonic processes is not allowed to has Children '        _cleanup ()        if Self._popen is not None:            Popen = Self._popen        else: from            . forking Import Popen        Self._popen = Popen (self)        _current_process._children.add (self)

What do you think of POPN?

multiprocessing/forking.py    class Popen (object):        def __init__ (self, process_obj):            Sys.stdout.flush ()            Sys.stderr.flush ()            self.returncode = None            self.pid = os.fork ()            if self.pid = = 0:                if ' random ' in sys . Modules:                    import random                    random.seed ()                code = process_obj._bootstrap ()                Sys.stdout.flush ()                Sys.stderr.flush ()                os._exit (code)

The key place is the last Os._exit (code), why say the most important? Because this part of the exit will determine what the process will be dealing with "hand tail",

What the hell is os._exit? In fact, is the standard library of _eixt, so we can easily learn this thing

https://my.oschina.net/u/2291 ...

In the above link, we can clearly see that _exit () and exit () is a comparison of two things, _exit () simple violence, directly discard user-state content, into the kernel, and exit () is more patient to clean up for us

Can we assume, then, if Popen's exit is not Os._exit ()?

Luckily, Sys.exit () is our first exit (), without further ado, try it!

multiprocessing/forking.py    class Popen (object):        def __init__ (self, process_obj):            Sys.stdout.flush ()            Sys.stderr.flush ()            self.returncode = None            self.pid = os.fork ()            if self.pid = = 0:                if ' random ' in sys . Modules:                    import random                    random.seed ()                code = process_obj._bootstrap ()                Sys.stdout.flush ()                Sys.stderr.flush ()                #os. _exit (code)                Sys.exit (code)

Test the code to return the original version without the ' O ' Fill

[root@iz23pynfq19z ~]# python 2.py; Cat Error1.log Putendhello, Func put write

We can see that it can be written in, which proves that the above statement is standing in the footsteps of

However, it is best not to change the source code Oh, after all, these are old-timer optimization results, perhaps this is they deliberately write these, in order to avoid certain problems. Or standardize your behavior and try to minimize these seemingly nonstandard implementations.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.