Python/numpy Big Data programming experience 1. Edge Processing Edge save data, do not finish the disposable save. Otherwise the program ran for hours or even days after the hang, there is nothing. Even if some of the results are not practical, you can analyze the problem of the program flow or the characteristics of the data. 2. Release large chunks of memory with Del in time. Python defaults to releasing a variable outside of the variable range (variablescope), even if the variable is not used in the subsequent code, so you need to manually release the large array. Note that all arrays are referenced by Del and the array will be Del. These references include a[2:] Such a view, even if Np.split only created a view, did not really put memory into different arrays. 3. Matrix point Multiply Diagonal array, with progressive multiplication can be fast dozens of, hundreds of times times: M.dot (Diag (v)), M*v. 4. Try to reuse memory. For example SQRTW = np.sqrt (w) (W is not used later) so much time to allocate SQRTW memory can be rewritten as NP.SQRT (w,w) # in placesqrt SQRTW = W # take auser-friendly name as its reference similar &nbs P A = B + C # b is neverused later can be rewritten as B + = C; A = b 4. Use Ipython's run-p prog.py to do profiling and find the most time-consuming statements. can also implement a simple timer class that prints out time consuming processes. 5. The actual code is highly simplified, leaving only skeleton that use the same size of memory and the same number of operations to evaluate the algorithm's time and space complexity beforehand. And can be divided into a block evaluation. such as ... complex and slowroutine to compute V11, wsum, Gwmean ...... for i in Xrange (n Oncore_size): WI = wsum[I] VW = V11. t* wi VWV =vw.dot (V11) v21[i] =np.linalg. INV (VWV) dot (Vw.dot (gwmean[i)) can write a test.py, initialize NP.RANDOM.RANDN with v11,wsum () randomly, Gwmean, and then execute this block of code, The approximate amount of memory required and the time of each cycle are seen, avoiding the time taken to calculate these variables long before execution. 6. If it is windows, turn off the option to automatically install updates for Windows. Otherwise it might have run all night. The program, the results of a look, Windows automatically restarted ... Cry
Python/numpy Big Data programming experience