Multiprocessing Python multi-process module, so, processing is also a multi-process darling. But the issues that we are discussing today seem to give us a great deal of attention.
Directly on the code:
From multiprocessing import Process, lockerr_file = ' Error1.log ' err_fd = open (Err_file, ' W ') def put (FD): print " Put " fd.write (" Hello, Func put write\n ") print" END "if __name__== ' __main__ ': p_list=[] for i in range ( 1): p_list.append (Process (Target=put, args= (ERR_FD,)))) for p in p_list: P.start () for p in P_list: P.join ()
The code above is intended to be clear: through multiprocessing. Processes derive a process to execute the PUT function, the Put function is also very clear, output put and end, and writes "Hello, Func put write" to the file Error1.log.
Then it should be said that the output should be as stated above, put and end, and then Error1.log will have that sentence "Hello, func put write", however, the world is always a bit difficult, the code execution results are:
[root@iz23pynfq19z ~]# py27 2.py; Cat Error1.log putend[root@iz23pynfq19z ~]#
What!? Why Error1.log have nothing!?
Let's tweak the code a little bit, and then witness the magical thing:
From multiprocessing import Process, lockerr_file = ' Error1.log ' err_fd = open (Err_file, ' W ') def put (FD): print " Put " fd.write (" Hello, Func put write\n ") fd.write (" o "* 4075) # A magical line of print" END "if __name__== ' __main__ ': C5/>p_list=[] for i in range (1): p_list.append (Process (Target=put, args= (ERR_FD,))) for p in p_list: P.start () for p in p_list: p.join ()
Output Result:
[root@iz23pynfq19z ~]# py27 2.py; Cat Error1.log Putendhello, func put Writeo .... (There are 4,075 of them) [root@iz23pynfq19z ~]#
Have you ever felt a kind of confused feeling!?
Now, there are two questions emerging in mind:
Why is the first program unable to write that sentence, but the second one is okay?
What the hell is that 4075?
Before explaining these issues, we need to be clear about the features of the standard IO Library: full buffering, row buffering, non-buffering
Specifically, you can read the previous blog post: https://my.oschina.net/u/2291 ...
Because the file is now written, System IO will be fully buffered, that is, the buffer will be filled before it is brushed into the system write queue.
So the above problem is solved all at once, just because those crazy ' o ', fills the whole buffer, so the system will brush our content into the queue, so 4075 how to come, is to use 4096-sizeof ("Hello, Func put writen") +1, why to + 1, because the buffer is not full enough to be larger than to trigger write action.
So we are now able to come to an answer if we want to be in multiprcessing. In process, there are three ways to do this when writing a file in a similar way:
From Python official website Document:open (name[, mode[, buffering]) ... The optional buffering argument specifies the file ' s desired buffer size:0 means unbuffered, 1 means line buffered, a NY Other positive value means use a buffer of (approximately) the size (in bytes). A negative buffering means to use the system default, which are usually line buffered for TTY devices and fully buffere D for other files. If omitted, the system default is used. [2]
The explanation is that allowing us to set buffering to 0 at open, then the unbuffered mode, then write directly to the write queue instead of writing to the buffer. (The least-performing way)
------------------------------------------------I'm a cutting line----------------------------------------------
After talking about the phenomenon and the way of dealing with it, we should take a bit in-depth;
I believe we have tried, when the file object is not shown or display call flush, the files can still write, then what is the matter?
In fact, when we normally close the program, the process exits will do some "hand" for us, such as close open file descriptor, clean up temporary files, clean up memory and so on. It is because of this "good habit" of the system, so our data can be brushed into the write queue when the file descriptor is closed, and the file contents will not be lost
So based on this understanding, we look back to the problem, when the child process calls put, in theory, when the program exits, does not show the closing file descriptor, so the data in the buffer is lost.
Let's take a look at the implementation of the process
multiprocessing/processing.py def start (self): "Start child Process" " assert Self._popen Is None, ' cannot start a process twice ' assert self._parent_pid = = Os.getpid (), \ ' can only start a process object Created by current process ' assert not _current_process._daemonic, \ ' daemonic processes is not allowed to has Children ' _cleanup () if Self._popen is not None: Popen = Self._popen else: from . forking Import Popen Self._popen = Popen (self) _current_process._children.add (self)
What do you think of POPN?
multiprocessing/forking.py class Popen (object): def __init__ (self, process_obj): Sys.stdout.flush () Sys.stderr.flush () self.returncode = None self.pid = os.fork () if self.pid = = 0: if ' random ' in sys . Modules: import random random.seed () code = process_obj._bootstrap () Sys.stdout.flush () Sys.stderr.flush () os._exit (code)
The key place is the last Os._exit (code), why say the most important? Because this part of the exit will determine what the process will be dealing with "hand tail",
What the hell is os._exit? In fact, is the standard library of _eixt, so we can easily learn this thing
https://my.oschina.net/u/2291 ...
In the above link, we can clearly see that _exit () and exit () is a comparison of two things, _exit () simple violence, directly discard user-state content, into the kernel, and exit () is more patient to clean up for us
Can we assume, then, if Popen's exit is not Os._exit ()?
Luckily, Sys.exit () is our first exit (), without further ado, try it!
multiprocessing/forking.py class Popen (object): def __init__ (self, process_obj): Sys.stdout.flush () Sys.stderr.flush () self.returncode = None self.pid = os.fork () if self.pid = = 0: if ' random ' in sys . Modules: import random random.seed () code = process_obj._bootstrap () Sys.stdout.flush () Sys.stderr.flush () #os. _exit (code) Sys.exit (code)
Test the code to return the original version without the ' O ' Fill
[root@iz23pynfq19z ~]# python 2.py; Cat Error1.log Putendhello, Func put write
We can see that it can be written in, which proves that the above statement is standing in the footsteps of
However, it is best not to change the source code Oh, after all, these are old-timer optimization results, perhaps this is they deliberately write these, in order to avoid certain problems. Or standardize your behavior and try to minimize these seemingly nonstandard implementations.
Welcome to the Great God of communication, reproduced please specify: HTTPS://SEGMENTFAULT.COM/A/11 ...
Multiprocessing Python multi-process module, so, processing is also a multi-process darling. But the issues that we are discussing today seem to give us a great deal of attention.
Directly on the code:
From multiprocessing import Process, lockerr_file = ' Error1.log ' err_fd = open (Err_file, ' W ') def put (FD): print " Put " fd.write (" Hello, Func put write\n ") print" END "if __name__== ' __main__ ': p_list=[] for i in range ( 1): p_list.append (Process (Target=put, args= (ERR_FD,)))) for p in p_list: P.start () for p in P_list: P.join ()
The code above is intended to be clear: through multiprocessing. Processes derive a process to execute the PUT function, the Put function is also very clear, output put and end, and writes "Hello, Func put write" to the file Error1.log.
Then it should be said that the output should be as stated above, put and end, and then Error1.log will have that sentence "Hello, func put write", however, the world is always a bit difficult, the code execution results are:
[root@iz23pynfq19z ~]# py27 2.py; Cat Error1.log putend[root@iz23pynfq19z ~]#
What!? Why Error1.log have nothing!?
Let's tweak the code a little bit, and then witness the magical thing:
From multiprocessing import Process, lockerr_file = ' Error1.log ' err_fd = open (Err_file, ' W ') def put (FD): print " Put " fd.write (" Hello, Func put write\n ") fd.write (" o "* 4075) # A magical line of print" END "if __name__== ' __main__ ': C5/>p_list=[] for i in range (1): p_list.append (Process (Target=put, args= (ERR_FD,))) for p in p_list: P.start () for p in p_list: p.join ()
Output Result:
[root@iz23pynfq19z ~]# py27 2.py; Cat Error1.log Putendhello, func put Writeo .... (There are 4,075 of them) [root@iz23pynfq19z ~]#
Have you ever felt a kind of confused feeling!?
Now, there are two questions emerging in mind:
Why is the first program unable to write that sentence, but the second one is okay?
What the hell is that 4075?
Before explaining these issues, we need to be clear about the features of the standard IO Library: full buffering, row buffering, non-buffering
Specifically, you can read the previous blog post: https://my.oschina.net/u/2291 ...
Because the file is now written, System IO will be fully buffered, that is, the buffer will be filled before it is brushed into the system write queue.
So the above problem is solved all at once, just because those crazy ' o ', fills the whole buffer, so the system will brush our content into the queue, so 4075 how to come, is to use 4096-sizeof ("Hello, Func put writen") +1, why to + 1, because the buffer is not full enough to be larger than to trigger write action.
So we are now able to come to an answer if we want to be in multiprcessing. In process, there are three ways to do this when writing a file in a similar way:
From Python official website Document:open (name[, mode[, buffering]) ... The optional buffering argument specifies the file ' s desired buffer size:0 means unbuffered, 1 means line buffered, a NY Other positive value means use a buffer of (approximately) the size (in bytes). A negative buffering means to use the system default, which are usually line buffered for TTY devices and fully buffere D for other files. If omitted, the system default is used. [2]
The explanation is that allowing us to set buffering to 0 at open, then the unbuffered mode, then write directly to the write queue instead of writing to the buffer. (The least-performing way)
------------------------------------------------I'm a cutting line----------------------------------------------
After talking about the phenomenon and the way of dealing with it, we should take a bit in-depth;
I believe we have tried, when the file object is not shown or display call flush, the files can still write, then what is the matter?
In fact, when we normally close the program, the process exits will do some "hand" for us, such as close open file descriptor, clean up temporary files, clean up memory and so on. It is because of this "good habit" of the system, so our data can be brushed into the write queue when the file descriptor is closed, and the file contents will not be lost
So based on this understanding, we look back to the problem, when the child process calls put, in theory, when the program exits, does not show the closing file descriptor, so the data in the buffer is lost.
Let's take a look at the implementation of the process
multiprocessing/processing.py def start (self): "Start child Process" " assert Self._popen Is None, ' cannot start a process twice ' assert self._parent_pid = = Os.getpid (), \ ' can only start a process object Created by current process ' assert not _current_process._daemonic, \ ' daemonic processes is not allowed to has Children ' _cleanup () if Self._popen is not None: Popen = Self._popen else: from . forking Import Popen Self._popen = Popen (self) _current_process._children.add (self)
What do you think of POPN?
multiprocessing/forking.py class Popen (object): def __init__ (self, process_obj): Sys.stdout.flush () Sys.stderr.flush () self.returncode = None self.pid = os.fork () if self.pid = = 0: if ' random ' in sys . Modules: import random random.seed () code = process_obj._bootstrap () Sys.stdout.flush () Sys.stderr.flush () os._exit (code)
The key place is the last Os._exit (code), why say the most important? Because this part of the exit will determine what the process will be dealing with "hand tail",
What the hell is os._exit? In fact, is the standard library of _eixt, so we can easily learn this thing
https://my.oschina.net/u/2291 ...
In the above link, we can clearly see that _exit () and exit () is a comparison of two things, _exit () simple violence, directly discard user-state content, into the kernel, and exit () is more patient to clean up for us
Can we assume, then, if Popen's exit is not Os._exit ()?
Luckily, Sys.exit () is our first exit (), without further ado, try it!
multiprocessing/forking.py class Popen (object): def __init__ (self, process_obj): Sys.stdout.flush () Sys.stderr.flush () self.returncode = None self.pid = os.fork () if self.pid = = 0: if ' random ' in sys . Modules: import random random.seed () code = process_obj._bootstrap () Sys.stdout.flush () Sys.stderr.flush () #os. _exit (code) Sys.exit (code)
Test the code to return the original version without the ' O ' Fill
[root@iz23pynfq19z ~]# python 2.py; Cat Error1.log Putendhello, Func put write
We can see that it can be written in, which proves that the above statement is standing in the footsteps of
However, it is best not to change the source code Oh, after all, these are old-timer optimization results, perhaps this is they deliberately write these, in order to avoid certain problems. Or standardize your behavior and try to minimize these seemingly nonstandard implementations.