Several useful r small functions

Several useful r small functions _ productivity

Last Update:2018-08-23 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The most recently written code is the R script, which is more and more powerful. Now use it to do some data analysis and do some simulation.

Collect a few regular functions here.

1. Batch replaces the data in the database frame

I. Replace all data for 0 with 100

Res2$valuex[res2$valuex%in% 0]<-100

Ii. replace na with 0

Res2$valuex[is.na (Res2$valuex)]<-0

2. CDF Line

The CDF (cumulative distribution function) is a good tool for clearly understanding the distribution of data.

Showcdf<-function (Data,field) {

RES_CDF=ECDF (data)

Plot (Res_cdf,main=paste (' CDF of ', field)

#显示中位数, four points, maximum, and twice times the maximum value (as the case may be removed)

Summarydata=boxplot.stats (data) $stats

Summarydata[6]=summarydata[5]*2

For (index in 3:length (Summarydata)) {

Tempv=as.numeric (Summarydata[index])

R_value=floor (RES_CDF (TEMPV) *10000)/100

Lines (c (TEMPV,TEMPV), C (r_value/100,0), col= ' Red ', lwd=2,lty=3)

Label=paste (' <-', Floor (tempv*100)/100, ': ', r_value, '% ', sep= ')

Text (Tempv,index*0.15,label,cex=0.8,adj=c (0,1))

}

Effect:

* With the following statement can show the specific probability of the points, such as:

  Y<-quantile (Data,c (0.5,0.99))

The 50% and 99% points are taken out of the data.

3. Read data from MySQL

Library (' Rmysql ')

Readdatafrommysql<-function (tablename,targetdate) {

Drv<-dbdriver (' MySQL ')

Con<-dbconnect (drv,host= ' xxx.xxx.xxx.xxx ', port=3006,username= ' xx ', password= ' xxxx ', dbname= ' xxxx ')

Sqlstatement=paste ("Select * from", TableName)

if (nchar (targetdate) >0) {

SQLStatement = Paste (SQLStatement, "where date= '", TargetDate, "'", sep= ')

}

Print (SQLStatement)

Data=dbgetquery (Con, sqlstatement)

Dbdisconnect (Con)

Return (data)

}

* For SQLite or other database can correspond to the transformation.

4. Problem solving

For the 3rd chapter of the header data analysis, the solution of the optimization problem requires the installation of Lpsolve packages and R kits on the system: Lpsolve and Lpsolveapi.

Library (Lpsolve)

F2.obj<-c (5,4)

F2.con<-matrix (c (1,0,0,1,100,125), nrow=3,byrow=t)

F2.dir<-c (' <= ', ' <= ', ' <= ')

F2.rhs<-c (400,300,50000)

LP (' Max ', F2.OBJ,F2.CON,F2.DIR,F2.RHS) $solution

Reference: http://lpsolve.sourceforge.net/5.5/R.htm

5. Get parameters when executing from command line

#main entry

Args <-Commandargs (trailingonly = TRUE)

if (length (args) <1) {

Print ("Wrong parameters, please specify the target date!", quote = F)

} else {

Callprocessfunction (Args[1])

}

This can be done in such a way as:

Rscript xxx. R 2014-01-13

6. Remove abnormal data by box diagram (BoxPlot)

Removeoutdata<-function (data) {

result = Data[!data%in% boxplot.stats (data) $out]

Return (Result)

}

7. Use String Filter data

Filterdata<-function (Data,url) {

Rows=grep (Url,data$url)

Return (Data[c (rows),])

}

8. Using Ggplot2 Drawing

Ggplot2 provides very powerful features, if the plot series needs to be drawn many times, Ggplot2 can be a basic sentence, very worthy of learning applications.

Put a picture here for your reference:

9. Bars

Drawbars<-function (Data,xlab) {
Labels <-C ("A", "B", "C", "D")

Maxvalue=max (Max (data$a), Max (data$b), Max (data$c), Max (data$d))
Ylim<-c (0,maxvalue*1.1)

Datax<-rbind (data$a,data$b,data$c,data$d)
Barplot (t (datax), beside=true,col=terrain.colors (Length (data$t0)), Offset=0,names.arg = Labels,ylim=ylim,xlab=xlab )
Box ()
}

Effect:

10. Classification

Datacluster<-function (data,col,clusternum) {
Require ("FPC")
Require (cluster)

Z2<-na.omit (Data[,col])

Km <-Kmeans (Z2, Clusternum)

Clusplot (data, Km$cluster, Color=true, Shade=true, labels=2, lines=0)
}

Effect:

* Data visualization can help you analyze problems, such as analyzing the loading process:

11. Conversion of the factor series to numeric

Some of the frame loaded from the file, the sequence may be factors, can not be directly converted to numeric, then the following function is required:

Asnumeric <-function (x) as.numeric (As.character (x))

Factorsnumeric <-function (d) modifylist (d, Lapply (d[, sapply (d, Is.factor)], asnumeric)

The above function is simpler to use:

data.x = Asnumeric (data.x)

The key is to switch to the string before you can move to the correct number.

* Before converting, if there are any outliers, such as NULL, remember to convert the first one, or filter it out.

If the data contains a comma, you can try this:

AsNumeric2 <-function (x) as.numeric (gsub ('![ [: Alnum:]] *[[:space:]]| [[:p UNCT:]] ', ', As.character (x)))

12. Operation with the name of the column

Taking the value of a field name increases the flexibility of the application, as follows:

As.matrix (res[c (' data ')]) is equivalent to Res$data

This usage solves the problem of not responding to data changes when specifying data with column numbers. Like what:

Keys<-c (' data_sum ', ' data1 ', ' data2 ')

For (key in keys) {
Data[c (Key)]<-asnumeric (As.matrix (Data[c (key))) #转为数值型
Data[c (Key)][is.na (Data[c (key)), 1]<-0 #将所有NA赋为0
}

13. Observe data distribution type

Datadistribution<-function (x,na.omit=f) {
  if (na.omit) {
    x<-x[!is.na (x)]
  }
  
  m<-mean (x)
  n<-length (x)
  s<-sd (x)
  skew<-sum ((x-m) ^3/s^3)/n
  kurt<-sum ((x-m) ^4/s^4)/n-3
  Return (c (N=n,mean=m,stdev=s,skew=skew,kurtosis=kurt))
}

How to use:

Sapply (Base_data[c (' A ', ' B ')],datadistribution)

14. Group Count

Using the aggregate function can do some of the work of grouping statistics brilliantly, but you can't use length directly. This is done by customizing a function to count only the unique values.

Fun<-function (x) {return (length (unique (x))}

Res<-aggregate (Values~groupby,data=data, Fun=fun)

Another handy is the summarise function of the PLYR Toolkit:

Library (PLYR)
sdata<-ddply (data,c (' Field2 '), Summarise,n=length (RT), Mean=mean (RT), SD=SD (RT), Se=sd/sqrt (N )
print (' Result of ddply function: ')
print (sdata)

15. The string operation in string operation R is often done using regular expressions. To remove the trailing spaces of a string:

Trim <-function (x) gsub ("^\\s+|\\s+$", "", X)

Here is an example of using grep to find strings and delimited strings:

Strval<-trim (Temp[j])
if (Length (grep (' ^max-age ', strval)) >0) {
Values<-strsplit (strval, ' = ')
Data$cache_max_age[i]<-as.numeric (Values[[1]][2])
}

16. Date conversion The following is a convert GMT date string to a POSIX date value in seconds: datetonum<-function (x) As.numeric (as. Posixct (Strptime (Trim (x), "%a,%d%b%Y%h:%m:%s GMT"))
The formatted string that follows must match the passed-in string.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Several useful r small functions _ productivity

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Several useful r small functions _ productivity

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support