International - English

Cart Console

Topic Center

Contact Sales

Home > Developer > Python

mutilprocessing Processing parent-child process sharing file objects in Python considerations

Last Update:2017-04-15 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Multiprocessing Python multi-process module, so, processing is also a multi-process darling. But the issues that we are discussing today seem to give us a great deal of attention.

Directly on the code:

From multiprocessing import Process, lockerr_file = ' Error1.log '  err_fd = open (Err_file, ' W ') def put (FD):     print " Put "     fd.write (" Hello, Func put write\n ")     print" END "if __name__== ' __main__ ':    p_list=[] for    i in range ( 1):        p_list.append (Process (Target=put, args= (ERR_FD,)))) for        p in p_list:        P.start () for    p in P_list:        P.join ()

The code above is intended to be clear: through multiprocessing. Processes derive a process to execute the PUT function, the Put function is also very clear, output put and end, and writes "Hello, Func put write" to the file Error1.log.

Then it should be said that the output should be as stated above, put and end, and then Error1.log will have that sentence "Hello, func put write", however, the world is always a bit difficult, the code execution results are:

[root@iz23pynfq19z ~]# py27 2.py; Cat Error1.log putend[root@iz23pynfq19z ~]#

What!? Why Error1.log have nothing!?

Let's tweak the code a little bit, and then witness the magical thing:

From multiprocessing import Process, lockerr_file = ' Error1.log '  err_fd = open (Err_file, ' W ') def put (FD):     print " Put "     fd.write (" Hello, Func put write\n ")     fd.write (" o "* 4075) # A magical line of     print" END "if __name__== ' __main__ ': C5/>p_list=[] for    i in range (1):        p_list.append (Process (Target=put, args= (ERR_FD,))) for    p in p_list:        P.start () for    p in p_list:        p.join ()

Output Result:

[root@iz23pynfq19z ~]# py27 2.py; Cat Error1.log Putendhello, func put Writeo .... (There are 4,075 of them) [root@iz23pynfq19z ~]#

Have you ever felt a kind of confused feeling!?

Now, there are two questions emerging in mind:

Why is the first program unable to write that sentence, but the second one is okay?
What the hell is that 4075?
Before explaining these issues, we need to be clear about the features of the standard IO Library: full buffering, row buffering, non-buffering

Specifically, you can read the previous blog post: https://my.oschina.net/u/2291 ...

Because the file is now written, System IO will be fully buffered, that is, the buffer will be filled before it is brushed into the system write queue.

So the above problem is solved all at once, just because those crazy ' o ', fills the whole buffer, so the system will brush our content into the queue, so 4075 how to come, is to use 4096-sizeof ("Hello, Func put writen") +1, why to + 1, because the buffer is not full enough to be larger than to trigger write action.

So we are now able to come to an answer if we want to be in multiprcessing. In process, there are three ways to do this when writing a file in a similar way:

Write full buffer
Manually Call Flush ()
Set the file object to not buffer
The first and second kind is described above, so let's briefly talk about the third kind:

From Python official website Document:open (name[, mode[, buffering]) ...  The optional buffering argument specifies the file ' s desired buffer size:0 means unbuffered,   1 means line buffered, a NY Other positive value means use a buffer of (approximately) the   size (in bytes). A negative buffering means to use the system default, which are usually line   buffered for TTY devices and fully buffere D for other files. If omitted, the system default is   used. [2]

The explanation is that allowing us to set buffering to 0 at open, then the unbuffered mode, then write directly to the write queue instead of writing to the buffer. (The least-performing way)

------------------------------------------------I'm a cutting line----------------------------------------------

After talking about the phenomenon and the way of dealing with it, we should take a bit in-depth;

I believe we have tried, when the file object is not shown or display call flush, the files can still write, then what is the matter?

In fact, when we normally close the program, the process exits will do some "hand" for us, such as close open file descriptor, clean up temporary files, clean up memory and so on. It is because of this "good habit" of the system, so our data can be brushed into the write queue when the file descriptor is closed, and the file contents will not be lost

So based on this understanding, we look back to the problem, when the child process calls put, in theory, when the program exits, does not show the closing file descriptor, so the data in the buffer is lost.

Let's take a look at the implementation of the process

multiprocessing/processing.py    def start (self):        "Start child        Process" "        assert Self._popen  Is None, ' cannot start a process twice '        assert self._parent_pid = = Os.getpid (), \               ' can only start a process object Created by current process '        assert not _current_process._daemonic, \               ' daemonic processes is not allowed to has Children '        _cleanup ()        if Self._popen is not None:            Popen = Self._popen        else: from            . forking Import Popen        Self._popen = Popen (self)        _current_process._children.add (self)

What do you think of POPN?

multiprocessing/forking.py    class Popen (object):        def __init__ (self, process_obj):            Sys.stdout.flush ()            Sys.stderr.flush ()            self.returncode = None            self.pid = os.fork ()            if self.pid = = 0:                if ' random ' in sys . Modules:                    import random                    random.seed ()                code = process_obj._bootstrap ()                Sys.stdout.flush ()                Sys.stderr.flush ()                os._exit (code)

The key place is the last Os._exit (code), why say the most important? Because this part of the exit will determine what the process will be dealing with "hand tail",

What the hell is os._exit? In fact, is the standard library of _eixt, so we can easily learn this thing

https://my.oschina.net/u/2291 ...

In the above link, we can clearly see that _exit () and exit () is a comparison of two things, _exit () simple violence, directly discard user-state content, into the kernel, and exit () is more patient to clean up for us

Can we assume, then, if Popen's exit is not Os._exit ()?

Luckily, Sys.exit () is our first exit (), without further ado, try it!

multiprocessing/forking.py    class Popen (object):        def __init__ (self, process_obj):            Sys.stdout.flush ()            Sys.stderr.flush ()            self.returncode = None            self.pid = os.fork ()            if self.pid = = 0:                if ' random ' in sys . Modules:                    import random                    random.seed ()                code = process_obj._bootstrap ()                Sys.stdout.flush ()                Sys.stderr.flush ()                #os. _exit (code)                Sys.exit (code)

Test the code to return the original version without the ' O ' Fill

[root@iz23pynfq19z ~]# python 2.py; Cat Error1.log Putendhello, Func put write

We can see that it can be written in, which proves that the above statement is standing in the footsteps of

However, it is best not to change the source code Oh, after all, these are old-timer optimization results, perhaps this is they deliberately write these, in order to avoid certain problems. Or standardize your behavior and try to minimize these seemingly nonstandard implementations.
Welcome to the Great God of communication, reproduced please specify: HTTPS://SEGMENTFAULT.COM/A/11 ...

Multiprocessing Python multi-process module, so, processing is also a multi-process darling. But the issues that we are discussing today seem to give us a great deal of attention.

Directly on the code:

From multiprocessing import Process, lockerr_file = ' Error1.log '  err_fd = open (Err_file, ' W ') def put (FD):     print " Put "     fd.write (" Hello, Func put write\n ")     print" END "if __name__== ' __main__ ':    p_list=[] for    i in range ( 1):        p_list.append (Process (Target=put, args= (ERR_FD,)))) for        p in p_list:        P.start () for    p in P_list:        P.join ()

[root@iz23pynfq19z ~]# py27 2.py; Cat Error1.log putend[root@iz23pynfq19z ~]#

What!? Why Error1.log have nothing!?

Let's tweak the code a little bit, and then witness the magical thing:

From multiprocessing import Process, lockerr_file = ' Error1.log '  err_fd = open (Err_file, ' W ') def put (FD):     print " Put "     fd.write (" Hello, Func put write\n ")     fd.write (" o "* 4075) # A magical line of     print" END "if __name__== ' __main__ ': C5/>p_list=[] for    i in range (1):        p_list.append (Process (Target=put, args= (ERR_FD,))) for    p in p_list:        P.start () for    p in p_list:        p.join ()

Output Result:

[root@iz23pynfq19z ~]# py27 2.py; Cat Error1.log Putendhello, func put Writeo .... (There are 4,075 of them) [root@iz23pynfq19z ~]#

Have you ever felt a kind of confused feeling!?

Now, there are two questions emerging in mind:

Why is the first program unable to write that sentence, but the second one is okay?
What the hell is that 4075?
Before explaining these issues, we need to be clear about the features of the standard IO Library: full buffering, row buffering, non-buffering

Specifically, you can read the previous blog post: https://my.oschina.net/u/2291 ...

Because the file is now written, System IO will be fully buffered, that is, the buffer will be filled before it is brushed into the system write queue.

So we are now able to come to an answer if we want to be in multiprcessing. In process, there are three ways to do this when writing a file in a similar way:

Write full buffer
Manually Call Flush ()
Set the file object to not buffer
The first and second kind is described above, so let's briefly talk about the third kind:

From Python official website Document:open (name[, mode[, buffering]) ...  The optional buffering argument specifies the file ' s desired buffer size:0 means unbuffered,   1 means line buffered, a NY Other positive value means use a buffer of (approximately) the   size (in bytes). A negative buffering means to use the system default, which are usually line   buffered for TTY devices and fully buffere D for other files. If omitted, the system default is   used. [2]

The explanation is that allowing us to set buffering to 0 at open, then the unbuffered mode, then write directly to the write queue instead of writing to the buffer. (The least-performing way)

------------------------------------------------I'm a cutting line----------------------------------------------

After talking about the phenomenon and the way of dealing with it, we should take a bit in-depth;

I believe we have tried, when the file object is not shown or display call flush, the files can still write, then what is the matter?

Let's take a look at the implementation of the process

multiprocessing/processing.py    def start (self):        "Start child        Process" "        assert Self._popen  Is None, ' cannot start a process twice '        assert self._parent_pid = = Os.getpid (), \               ' can only start a process object Created by current process '        assert not _current_process._daemonic, \               ' daemonic processes is not allowed to has Children '        _cleanup ()        if Self._popen is not None:            Popen = Self._popen        else: from            . forking Import Popen        Self._popen = Popen (self)        _current_process._children.add (self)

What do you think of POPN?

multiprocessing/forking.py    class Popen (object):        def __init__ (self, process_obj):            Sys.stdout.flush ()            Sys.stderr.flush ()            self.returncode = None            self.pid = os.fork ()            if self.pid = = 0:                if ' random ' in sys . Modules:                    import random                    random.seed ()                code = process_obj._bootstrap ()                Sys.stdout.flush ()                Sys.stderr.flush ()                os._exit (code)

The key place is the last Os._exit (code), why say the most important? Because this part of the exit will determine what the process will be dealing with "hand tail",

What the hell is os._exit? In fact, is the standard library of _eixt, so we can easily learn this thing

https://my.oschina.net/u/2291 ...

Can we assume, then, if Popen's exit is not Os._exit ()?

Luckily, Sys.exit () is our first exit (), without further ado, try it!

multiprocessing/forking.py    class Popen (object):        def __init__ (self, process_obj):            Sys.stdout.flush ()            Sys.stderr.flush ()            self.returncode = None            self.pid = os.fork ()            if self.pid = = 0:                if ' random ' in sys . Modules:                    import random                    random.seed ()                code = process_obj._bootstrap ()                Sys.stdout.flush ()                Sys.stderr.flush ()                #os. _exit (code)                Sys.exit (code)

Test the code to return the original version without the ' O ' Fill

[root@iz23pynfq19z ~]# python 2.py; Cat Error1.log Putendhello, Func put write

We can see that it can be written in, which proves that the above statement is standing in the footsteps of

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

Python abstract class (ABC module) 09-18

The difference between OS and sys two modules in Python 04-05

Python: send emails 12-08

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

mutilprocessing Processing parent-child process sharing file objects in Python considerations

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support