Python/numpy Big Data Programming experience

Source: Internet
Author: User

Python/numpy Big Data programming experience 1. Edge Processing Edge save data, do not finish the disposable save. Otherwise the program ran for hours or even days after the hang, there is nothing. Even if some of the results are not practical, you can analyze the problem of the program flow or the characteristics of the data.  2. Release large chunks of memory with Del in time. Python defaults to releasing a variable outside of the variable range (variablescope), even if the variable is not used in the subsequent code, so you need to manually release the large array.     Note that all arrays are referenced by Del and the array will be Del. These references include a[2:] Such a view, even if Np.split only created a view, did not really put memory into different arrays.  3. Matrix point Multiply Diagonal array, with progressive multiplication can be fast dozens of, hundreds of times times: M.dot (Diag (v)), M*v.  4. Try to reuse memory. For example     SQRTW = np.sqrt (w)     (W is not used later)     so much time to allocate SQRTW memory     can be rewritten as      NP.SQRT (w,w) # in placesqrt    SQRTW = W # take auser-friendly name as its reference    similar &nbs P   A = B + C # b is neverused later    can be rewritten as      B + = C; A = b 4. Use Ipython's run-p prog.py to do profiling and find the most time-consuming statements.     can also implement a simple timer class that prints out time consuming processes.  5. The actual code is highly simplified, leaving only skeleton that use the same size of memory and the same number of operations to evaluate the algorithm's time and space complexity beforehand. And can be divided into a block evaluation. such as     ... complex and slowroutine to compute V11, wsum, Gwmean ......       for i in Xrange (n Oncore_size):           WI = wsum[I]           VW = V11. t* wi           VWV =vw.dot (V11)            v21[i] =np.linalg. INV (VWV) dot (Vw.dot (gwmean[i))   can write a test.py, initialize NP.RANDOM.RANDN with v11,wsum () randomly, Gwmean, and then execute this block of code, The approximate amount of memory required and the time of each cycle are seen, avoiding the time taken to calculate these variables long before execution.  6. If it is windows, turn off the option to automatically install updates for Windows. Otherwise it might have run all night. The program, the results of a look, Windows automatically restarted ... Cry

Python/numpy Big Data programming experience

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.