This article will introduce the parallelization of Python, and the simple application.
This paper mainly introduces the use of the map function, and handedly a series of operations, such as sequence operation, parameter transfer and result saving.
The first is the introduction of libraries:
From Multiprocessing.dummy import Pool
Pool=pool (4)
Results=pool.map (crawl function, URL list)
This article looks at a simple example of how to use the map function and how this is compared to a common method.
ImportTime fromMultiprocessing.dummyImportPool def getsource(URL):Html=requests.get (URL) urls=[] forIinchRange1, +): newpage=' http://tieba.baidu.com/p/3522395718?pn= '+str (i) urls.append (newpage) timex=time.time ()#测试一 forIinchUrls:getsource (i)Print(Time.time ()-timex)#这里是输出的结果:#10.2820000648Time1=time.time ()#测试二Pool=pool (4) Results=pool.map (Getsource,urls) pool.close () Pool.join ()Print(Time.time ()-time1)#这里是输出结果:#3.23600006104
Comparing the two methods above, it is obvious that test two is much faster than test one.
Explain the program:
Test A
For I in URLs:
GetSource (i) #使程序一直遍历urls列表中的网址, and then loop calls the GetSource function
Test in two:
Pool=pool (4) #声明了4个线程数量, the number depends on the number of CPUs in your computer.
Results=pool.map (Getsource,urls) #这里使用map函数, and the parameters of the function are the custom function names, as well as the parameters in the function (here is a list)
Pool.close () #关闭pool对象
Pool.join () #join函数的主要作用是等待所有的线程 (4) After the execution is complete
Print (Time.time ()-time1) #输出所用时间差
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Introduction and use of Python parallelism