Python calls external sub-processes and achieves asynchronous standard input and output through pipelines. we usually encounter the following requirements: A complex functional module is implemented through C ++ or other lower-layer languages. you need to build a Web-based Demo to query data in a way. Because of the powerful and concise Python language, it is very suitable to build a Demo. The Flask framework and jinja2 module provide python with convenient web development capabilities. At the same time, python can easily interact with code in other languages. Therefore, we chose python as the Demo development tool. Assume that the module we want to call (which provides underlying services) reads data through a standard input loop, and writes the result to mark the output after processing is complete. this is a common scenario in Linux, depends on the powerful redirection capability of Linux. However, unfortunately, the underlying module has a very heavy initialization process, so we cannot re-generate the sub-process that calls the underlying module for each query request. The solution is to generate only one sub-process and then interact with the sub-process through the pipeline (pipe) for each request.
The subprocess module of Python can easily generate sub-processes, similar to the Linux system calling fork and exec. The Popen object of the subprocess module may call external executable programs in a non-blocking manner, so we use the Poen object to implement the requirements. If we want to write data to the standard input stdin of the sub-process, we need to specify the stdin parameter as subprocess when creating the Popen object. PIPE; similarly, if we need to read data from the standard output of the sub-process, we need to specify the stdout parameter as subprocess when creating the Popen object. PIPE. Let's take a look at a simple example:
from subprocess import Popen, PIPEp = Popen('less', stdin=PIPE, stdout=PIPE)p.communicate('Line number %d.\n' % x)
The communicate function returns a binary group (stdoutdata, stderrdata) that contains standard output of sub-processes and output data marked with errors. However, because the communicate function of the Popen object blocks the parent process and closes the pipeline, each Popen object can only call the communicate function once, if there are multiple requests that must regenerate the Popen object (re-initialize the sub-process), it cannot meet our needs.
Therefore, we only need to write and read data to stdin and stdout objects of the Popen object. However, unfortunately, the subprocess module reads the standard output only once when the sub-process ends by default. Both subprocess and OS. popen * only allow input and output one time, and the output to be read only when the process terminates.
After some research, I found that the fcntl function of the fcntl module can change the standard output of sub-processes to a non-blocking method, so as to achieve our goal. The problems that have plagued me for a long time have finally been solved perfectly. The code is as follows:
#! /Usr/bin/python #-*-coding: UTF-8-*-# author: weisu.yxd@taobao.comfrom subprocess import Popen, PIPEimport fcntl, osimport timeclass Server (object ): def _ init _ (self, args, server_env = None): if server_env: self. process = Popen (args, stdin = PIPE, stdout = PIPE, stderr = PIPE, env = server_env) else: self. process = Popen (args, stdin = PIPE, stdout = PIPE, stderr = PIPE) flags = fcntl. fcntl (self. process. stdout, fcntl. f_GETFL) fcntl. fcntl (self. process. stdout, fcntl. f_SETFL, flags | OS. o_NONBLOCK) def send (self, data, tail = '\ n'): self. process. stdin. write (data + tail) self. process. stdin. flush () def recv (self, t =. 1, e = 1, tr = 5, stderr = 0): time. sleep (t) if tr <1: tr = 1 x = time. time () + t r = ''pr = self. process. stdout if stderr: pr = self. process. stdout while time. time () <x or r: r = pr. read () if r is None: if e: raise Exception (message) else: break elif r: return r. rstrip () else: time. sleep (max (x-time.time ()/tr, 0) return r. rstrip () if _ name _ = "_ main _": ServerArgs = ['/home/weisu. yxd/QP/trunk/bin/normalizer ','/home/weisu. yxd/QP/trunk/conf/stopfile.txt '] server = Server (ServerArgs) test_data = 'cloud', 'yunti', 'mosad ', 'Alisa', 'idb ', 'Alibaba big data' for x in test_data: server. send (x) print x, server. recv ()
In addition, when calling some external programs, you may need to specify the corresponding environment variables as follows:
my_env = os.environ my_env["LD_LIBRARY_PATH"] = "/path/to/lib" server = server.Server(cmd, my_env)