1. Performance evaluation
Time Measurement method
The simplest method of measuring time provided in R is the System.time function.
System.time (expr, gcfirst=true)
This function does not degrade the performance of the program, the execution of the expression expr,gcfrist is to specify whether the program will perform garbage collection before running.
Do.stuff <-function () {A <-1:100000for (i in 1:100000) { a[i] <-a[i]^2}a}system.time (Do.stuff ())
Monitoring Memory methods
R in function GC (), there are two functions, one is to perform a garbage cleanup immediately, and the second is to display the statistics of the remaining memory.
GC ()
Used is the current usage, GC trigger is the value that will trigger garbage collection, Max used is the last GC () operation or the maximum value is used after this startup R. (MB) is the value of Ncells and Vcells when the size is converted to Mb units.
Ncells is the cons cells,32 bit R in the 28B,64 bit R accounted for 56B, I was using the 32-bit R, so 2616689*28/(1024^2) = 69.9.
Vcells is vector cells, which accounts for 8 B, so 63817864*8/(1024^2) = 486.9.
Not very understand ncells and vcells respectively refers to the r in what object, online also did not find very exact statement, so do not know how to translate them, have to know friends want to tell, thank you!
The Object.size () function in R can see how much memory each object consumes.
Object.size (1) object.size (train)
The Memory.profile () function in R can see the memory footprint of different object types.
Memory.profile ()
But Memory.profile () shows Ncells statistics, and you can see that the number of ncells used in GC () is very close to the total amount of memory.profiles ().
R in the Memory.size () function, you can see the amount of memory used by R, and you can set the parameter max=true to see the last GC () operation or the maximum amount of memory used after this startup R.
Memory.size () memory.size (max=true)
Time performance analysis
R has the Rprof () method, which can monitor the time-consuming of each operation statement in the R language program.
Rprof (filenames= "Rprof.out", Append=flase,interval=0.02,memory.profiling=false)
Filenames output File path
Append append content to existing file or overwrite existing file
Interval with time interval
Memory.profiling whether to write memory information to a file
Boot performance monitoring is rprof (filename)
Stop performance monitoring when rprof () or rprof (NULL)
The Summaryrprof () method can view the results of the RPROF () performance acquisition.
Summaryrprof (filenames= "Rprof.out", Chunksize=5000,memory=c ("None", "both", "tseries", "stats"), Index=2,diff=true, Exclude=null)
Filenames output File path
Chunksize number of rows read at a time
Memory consumption information, which is not displayed, the time and memory information are displayed, the way of a sequence of times display, display memory consumption statistics.
Index whether to write memory information to the file
Whether diff shows memory usage changes in memory statistics, or total memory consumption
Exclude specifying functions to exclude from statistical results
This section does not give a detailed example, you can see this article describes the performance monitoring examples, the use of these two functions: http://blog.fens.me/r-perform-rprof-profr/.
Memory Performance Analysis
R has rprofmen.
Rprofmem (filename = "Rprofmem.out", append =false, threshold = 0)
Filenames output File path
Append append content to existing file or overwrite existing file
Threshold memory allocation is greater than this value will be logged, Unit bytes
Boot performance monitoring is rprofmem (filename)
Stop performance monitoring when rprofmem () or Rprofmem (NULL)
View the results of the run and read the filename directly. The following example is an example of a function description document:
Rprofmem ("rprofmem.out", threshold = +) example (GLM) Rprofmem (NULL) noquote (ReadLines ("Rprofmem.out", n = 5))
2. Optimize R code
Using vector manipulation
One of the great features of R is the ability to perform vector manipulation, which is more efficient than the iterative approach.
Square.two <-function (n) {v <-numeric (0) Length (v) <-nfor (i in 1:n) { v[i] <-i^2}v}square.two (Ten) system . Time (Square.two (10000)) System.time (Square.two (100000)) System.time (Square.two (1000000))
As you can see, the time consumed increases linearly with the length of the vector.
Use the vector to implement the square operation of this function, the code is as follows:
Better. Square <-function (n) {(1:n) ^2}better.square (Ten) System.time (Better.square (10000)) System.time (Better.square ( 100000)) System.time (Better.square (1000000))
As you can see, vector operations are much faster than the previous loop implementation.
Using built-in functions
In most cases, built-in functions perform better than code you write. Built-in functions in R often consist of compiled code that is implemented in other languages (usually C and Fortran), and these functions are much more efficient than the explanatory R programs. In fact, the above vector operation can also be seen as using R's built-in functions, do not give more examples.
Memory Pre-allocation
Frequent application of memory increases the time-consuming process, which is true in many programming languages. The R language operand does not need to allocate memory in advance, but allocating memory in advance can speed up the operation. As an example of a square operation, there is a significant difference in runtime performance when there is a large amount of data in pre-allocated memory and non-advance allocation. Of course, we already know a much faster way to do this--vector manipulation.
Square.one <-function (n) {v <-numeric (0) for (i in 1:n) { v[i] <-i^2}v} square.two <-function (n) {v <-n Umeric (0) Length (v) <-nfor (i in 1:n) { v[i] <-i^2}v}
Find performance
There are many ways to find vectors in the R language. Subscript lookup, name tag single parenthesis lookup, name tag double parenthesis exact lookup, name tag double parenthesis Fuzzy Lookup, etc.
The time complexity of these methods of finding:
Subscript Lookup: Time complexity of 1;
Name Tags single parenthesis lookup: Time complexity N;
Word label double parenthesis exact lookup (default): Time complexity 1;
Name tag double bracket Fuzzy Lookup: Time complexity 1.
This is also a place in the process of writing R programs that may increase the efficiency of the program.
Description of exact find and Fuzzy Lookup:
Diary <-List (milk= "1 gallon", butter= "1 pound", eggs=12) diary[["milk"]]diary[["mil"]]diary[["mil", Exact=false]
Reference: "R in a Nutshell", this book is suitable for making introductory books.
Have any questions suggest welcome to indicate, reprint please indicate source, thank you!
Improve the performance of the R language program