"R Notes" R memory management and garbage cleanup

Source: Internet
Author: User

r Input command when the speed is not too fast, after all, is a statistical software, not programming!

 Write R program of people, believe will encounter "cannot allocate vector of size" or "Cannot allocate size to ... The vector "of such errors. The reason is very simple, basically is to produce a large matrix and other objects, the simplest solution has two, the first is to increase the memory Exchange 64-bit system, the second is to change the algorithm to avoid such a large object. The first approach is the best approach, but the demand for large objects is endless, and ultimately it is not a long way. The second approach is the best way of thinking, no matter how big the object can be small, nothing but divide and conquer, time for space, etc., the study of the algorithm is also endless.

upgrading hardware and improving algorithms is a timeless way to solve memory problems beyond the scope of this article. Here, simply talk about the R language memory management and garbage cleanup mechanism, only to understand these, to any problem can find a targeted solution.

I believe that all of us will be able to find it quickly after encountering the problem of unable to allocate vectors.change "--max-mem-size" (assumed to be under Windows) or "Memory.limit"The method, indeed, is the most straightforward method. Because the immediate cause of the new object's inability to allocate memory is that there is not enough memory, and R gets the memory in the same way as any other application, it is going to the operating system for memory, and if you can't get a contiguous amount of memory space, there will be a failure to allocate memory. Since everyone using R is usually automatically installed automatically, the operating system is willing to allocate to r how much memory is the default setting, in R using command Memory.size (NA) or memory.limit () can see the current settings under the operating system can allocate to r the maximum memory. While you canuse Memory.size (F) to view the memory currently used by R, Memory.size (T) to view allocated memory(note that the memory used and allocated memory is increased at first, but as the garbage in R is cleaned up, the memory used is reduced, and the memory allocated to R is generally not changed.) )。 If Memory.limit () Gets the number is a very small memory, indicating that the operating system is too stingy, leaving so much memory to other programs with no R. The workaround is simply to open r without double-clicking the icon, but instead enter "Rgui--max-mem-size 2Gb" in "run" (assuming that you want to allocate 2G memory and set the installation folder of R correctly in the environment variable), run the Memory.limit () You will find that the memory is increased, in fact, the simpler way is to run Memory.limit directly in R (2000), the effect is the same, and do not restart R.

Unfortunately, in most cases, changing this value will not be effective, because this value is already large enough, the reason why the memory can not be allocated is not the operating system stingy to r injustice, but it does not come out, who can not get it out. This time you need to know what's going on with R's memory management.

The operation of R is basically implemented by variables, variables can be a variety of object types, R objects (such as matrices) in memory in two different places, one is heap memory (heap), its basic unit is "vcells", each size is 8 bytes, The new object will apply for a space, the value of all exist here, and C inside the heap memory is very similar. The second is the address pair (cons cells), as in Lisp cons cells, mainly used to store address information, the smallest unit generally in 32-bit systems is 28 bytes, 64-bit system is 56 bytes. In R, you can view all of the current object names through LS (), and for each object, you can see the size of the memory it occupies by object.size (x).

If the current object is taking up too much memory, you can manipulate the object to get a larger amount of available memory. A useful method is to change the object's storage mode byStorage.mode (x) to see the storage mode of an object, such as a matrix default is "Double", if the value of this matrix is an integer or even 0-1, there is absolutely no need to use double to occupy space, you can use Storage.mode (x) <-"integer" To change it to integer type, You can see that the size of the object changes to half the original.

for the current object to occupy too much memory, a very main reason is that in the process of writing a program caused too many intermediate objects, R is a very convenient language, we use it is generally written all kinds of complex models and algorithms, many problems construct several matrices through a series of matrix operations can be quickly resolved, However, if the large matrices of these auxiliary algorithms are not cleaned up, they will be left in the system to account for the memory. So in writing programs for intermediate objects, it is often usedRM (x) is a good habit, if it is very important information do not want to delete, can exist on the hard disk, such as CSV file or rsqlite, etc..

when RM () is used to delete an object, only the reference to the variable is deleted, and the memory space is not immediately cleared, and the object that loses the reference becomes a garbage in memory, and the mechanism for cleaning garbage is similar to Java, which is to automatically discover garbage in a certain amount of time and then centralize cleanup. So after deleting an object through RM (), the Task Manager in Windows can see that the memory occupied by R processes is not immediately released, but is not cleaned up after a while. If the object you want to delete is immediately cleaned up, you can runGarbage Processing function GC () will immediately free up space. But it is usually not necessary, because when the memory is not enough, the system will automatically clean up the garbage, we have to do is no longer use of the object RM () out, in writing R program should form a habit.

Most of the time, in the program, especially in the loop, if the memory is not properly handled, before the garbage cleanup, will be the memory explosion, so the new object must take into account the R memory management mechanism. We all know that the dimension of the matrix in R does not need to assign a fixed value (the array length of many languages cannot be a variable), which is very convenient for writing programs, so often in the loop there will be a longer period of a matrix, in fact,every time the matrix is increased, even if you assign to a variable of the same name, you need to open a new space., assuming that the initial matrix is 100K, the second 101K, has been increased to 120K, then, will be opened 100K, 101K until the continuous heap of 120K memory, ifstarting with a 120K, from 101K to 120K, will greatly save memory. Cbind function is also the reason, so in the loop should be careful not to misuse.

to deal with the memory of the problem is very simple, cultivate the habit of attention to memory at any time, each new object or loop assignment to properly estimate the amount of memory, large memory of the middle variable after the use of clean. If you really need to create a huge new object, you should consider somepackages that specialize in handling large memory objects and parallel processing, such as Bigmemorywait

Source: http://www.biostatistic.net/thread-3302-1-1.htmlSome people ask a similar question: Http://stackoverflow.com/questions/8342986/big-data-process-and-analysis-in-rIn addition to Hadoop, it seems that the Colbycol package is one of the scenarios:http://colbycol.r-forge.r-project.org/Another example of the seventh chapter of StackOverflow: Http://ishare.edu.sina.com.cn/f/23695419.html



From for notes (Wiz)



"R Notes" R memory management and garbage cleanup

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.