Improve the performance of the R language program

Source: Internet
Author: User

1. Performance evaluation

Time Measurement method

The simplest method of measuring time provided in R is the System.time function.

System.time (expr, gcfirst=true)

This function does not degrade the performance of the program, the execution of the expression expr,gcfrist is to specify whether the program will perform garbage collection before running.

Do.stuff <-function () {A <-1:100000for (i in 1:100000) {           a[i] <-a[i]^2}a}system.time (Do.stuff ())

Monitoring Memory methods

R in function GC (), there are two functions, one is to perform a garbage cleanup immediately, and the second is to display the statistics of the remaining memory.

GC ()

Used is the current usage, GC trigger is the value that will trigger garbage collection, Max used is the last GC () operation or the maximum value is used after this startup R. (MB) is the value of Ncells and Vcells when the size is converted to Mb units.

Ncells is the cons cells,32 bit R in the 28B,64 bit R accounted for 56B, I was using the 32-bit R, so 2616689*28/(1024^2) = 69.9.

Vcells is vector cells, which accounts for 8 B, so 63817864*8/(1024^2) = 486.9.

Not very understand ncells and vcells respectively refers to the r in what object, online also did not find very exact statement, so do not know how to translate them, have to know friends want to tell, thank you!

The Object.size () function in R can see how much memory each object consumes.

Object.size (1) object.size (train)

The Memory.profile () function in R can see the memory footprint of different object types.

Memory.profile ()

But Memory.profile () shows Ncells statistics, and you can see that the number of ncells used in GC () is very close to the total amount of memory.profiles ().



R in the Memory.size () function, you can see the amount of memory used by R, and you can set the parameter max=true to see the last GC () operation or the maximum amount of memory used after this startup R.

Memory.size () memory.size (max=true)


Time performance analysis

R has the Rprof () method, which can monitor the time-consuming of each operation statement in the R language program.

Rprof (filenames= "Rprof.out", Append=flase,interval=0.02,memory.profiling=false)

Filenames output File path

Append append content to existing file or overwrite existing file

Interval with time interval

Memory.profiling whether to write memory information to a file

Boot performance monitoring is rprof (filename)

Stop performance monitoring when rprof () or rprof (NULL)

The Summaryrprof () method can view the results of the RPROF () performance acquisition.

Summaryrprof (filenames= "Rprof.out", Chunksize=5000,memory=c ("None", "both", "tseries", "stats"), Index=2,diff=true, Exclude=null)

Filenames output File path

Chunksize number of rows read at a time

Memory consumption information, which is not displayed, the time and memory information are displayed, the way of a sequence of times display, display memory consumption statistics.

Index whether to write memory information to the file

Whether diff shows memory usage changes in memory statistics, or total memory consumption

Exclude specifying functions to exclude from statistical results

This section does not give a detailed example, you can see this article describes the performance monitoring examples, the use of these two functions: http://blog.fens.me/r-perform-rprof-profr/.

Memory Performance Analysis

R has rprofmen.

Rprofmem (filename = "Rprofmem.out", append =false, threshold = 0)

Filenames output File path

Append append content to existing file or overwrite existing file

Threshold memory allocation is greater than this value will be logged, Unit bytes

Boot performance monitoring is rprofmem (filename)

Stop performance monitoring when rprofmem () or Rprofmem (NULL)

View the results of the run and read the filename directly. The following example is an example of a function description document:

Rprofmem ("rprofmem.out", threshold = +) example (GLM) Rprofmem (NULL) noquote (ReadLines ("Rprofmem.out", n = 5))



2. Optimize R code

Using vector manipulation

One of the great features of R is the ability to perform vector manipulation, which is more efficient than the iterative approach.

Square.two <-function (n) {v <-numeric (0) Length (v) <-nfor (i in 1:n) {           v[i] <-i^2}v}square.two (Ten) system . Time (Square.two (10000)) System.time (Square.two (100000)) System.time (Square.two (1000000))

As you can see, the time consumed increases linearly with the length of the vector.

Use the vector to implement the square operation of this function, the code is as follows:

Better. Square <-function (n) {(1:n) ^2}better.square (Ten) System.time (Better.square (10000)) System.time (Better.square ( 100000)) System.time (Better.square (1000000))


As you can see, vector operations are much faster than the previous loop implementation.

Using built-in functions

In most cases, built-in functions perform better than code you write. Built-in functions in R often consist of compiled code that is implemented in other languages (usually C and Fortran), and these functions are much more efficient than the explanatory R programs. In fact, the above vector operation can also be seen as using R's built-in functions, do not give more examples.

Memory Pre-allocation

Frequent application of memory increases the time-consuming process, which is true in many programming languages. The R language operand does not need to allocate memory in advance, but allocating memory in advance can speed up the operation. As an example of a square operation, there is a significant difference in runtime performance when there is a large amount of data in pre-allocated memory and non-advance allocation. Of course, we already know a much faster way to do this--vector manipulation.

Square.one <-function (n) {v <-numeric (0) for (i in 1:n) {           v[i] <-i^2}v} square.two <-function (n) {v <-n Umeric (0) Length (v) <-nfor (i in 1:n) {           v[i] <-i^2}v}

Find performance

There are many ways to find vectors in the R language. Subscript lookup, name tag single parenthesis lookup, name tag double parenthesis exact lookup, name tag double parenthesis Fuzzy Lookup, etc.

The time complexity of these methods of finding:

Subscript Lookup: Time complexity of 1;

Name Tags single parenthesis lookup: Time complexity N;

Word label double parenthesis exact lookup (default): Time complexity 1;

Name tag double bracket Fuzzy Lookup: Time complexity 1.

This is also a place in the process of writing R programs that may increase the efficiency of the program.

Description of exact find and Fuzzy Lookup:

Diary <-List (milk= "1 gallon", butter= "1 pound", eggs=12) diary[["milk"]]diary[["mil"]]diary[["mil", Exact=false]

Reference: "R in a Nutshell", this book is suitable for making introductory books.

Have any questions suggest welcome to indicate, reprint please indicate source, thank you!

Improve the performance of the R language program

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.