R language ︱ basic function, statistic, common operation function _r︱ data operation and cleaning

Source: Internet
Author: User
Tags abs natural logarithm square root

First word: R language commonly used interface operation Help: Helps (nnet) = Nnet =?? Nnet Clear all display contents in the Command box: Ctrl+l clear Memory variables in R space: RM (List=ls ()), GC () Get or set the current working directory: GETWD, SETWD Save the specified file or read it from disk: Save, load Read, Read the file: read.table, wirte.table, Read.csv, Write.csv
1, some simple basic statistics

#基本统计量
sum/mean/sd/min    #一些基本统计量

which.min ()        #找出最小值的序号

The above is the singular column, if it is multivariate.

#多元数据
Colmeans ()    #每列, row is rows (landscape)
colnames ()    #列名
colsums ()     #列求和
CoV         () #协方差阵
cor ()         #相关矩阵
cor.test ()    #相关系数

ABS absolute value sqrt square root exp e^x secondary log natural logarithm log2,log10 other logarithm Sin,cos,tan trigonometric functions sinh,cosh,tanh hyperbolic function poly orthogonal polynomial polyroot polynomial root-finding

Object actions:
Assign assignment operation, equivalent to "<-" rm Delete Object ls show in-memory object str display intrinsic attribute of object or brief description object Ls.str Show details of all objects in memory length returns the number of elements in an object names the name of the display data, for data box is the column name levels factor vector Horizontal Dim data Dimension nrow Matrix or Data box row number ncol column number rownames data row First name colnames column name class data type mode the top n row of data mode head data tail Summary of the following n-line display object attr the attribute type of X

Type of is.na detection variable is.null is.array is.data.frame is.numeric Is.complex

Simple statistics: Max max element min minimum element range minimum and maximum vector sum and prod element multiplication pmax vector the same subscript to compare the largest, and form a new vector pmin vector between the same subscript for the smallest, and to form a new vector cumsum Cumulative sum Cumprod multiplication Cummax maximum cummin minimum mean mean Weighted,mean weighted average number median median number

SD standard deviation Norm Normal distribution F F distribution unif uniformly distributed Cauchy Cauchy distribution binom Two distribution Geom geometric distribution chisq.test Card square test, independent testing prop.test the hypothesis test of the overall mean value shapiro.te St Normal distribution test t.test T test, AOV variance analysis for total mean value ANOVA variance analysis of one or more model objects
2, Vector

Vectors are more widely used in circular statements.

#向量
#向量在循环语句中较为广泛
m=vector (length = 8); M  #生成一个长为8的布尔向量
m[1]= "1"; M             #赋值之后就会定义为字符
m[1]=1; M              #赋值之后, defined as numeric

Logical vector use

Y[y < 0] <--y[y < 0]      #表示将向量 (-y) the element corresponding to the negative element of vector y is assigned to the element in vector y that corresponds to the vector y minus element. function equivalent to: Y <-abs (y)


3. Data storage Form

#数据储存形式
data.frame (wi=iris,ci=cars)   #数据框形式, you can directly define the variable name
list (wi=iris,ci=cars)         #list, or you can directly define the variable name


Note: Attach (), detach ()

You can release a variable from a data box into RS memory and then call it directly.

Attach (Iris)
names (setosa)   
Detach (IRIS)

In Data.frame, it is possible to implement a dataset rename, such as Data.frame (X=iris,y=cars),

You can also implement landscape, portrait renaming, Data.frame (X=iris,y=cars,row.names=iris)

4. Data viewing function--names, str, unique combination, typeof (), Mode (), Class ()

# #数据查看函数
names (Iris)           #查看所有变量名字
str (IRIS)             #变量属性 (int integer, num value)
unique (iris$setosa)   # View the horizontal
table (iris$setosa) #分类水平 of the category variable    , the number of different levels (=unique+sum function)
Summary (IRIS) #所有变量各自的均值, the number of places, the number of points, the         largest, The minimum value and other statistics, in the regression is the coefficient table, such as
attributes (Iris)      #包括names (variable name), Row.names (ordinal name), Class (data form)

General names, STR, unique will be used in combination.


How to view the difference between the data type--typeof (), Mode (), Class ().

[Plain] View plain copy print? I'm going to use a factor example to illustrate that I want to be clear about > gl (2,5) #新建一个因子 [1] 1 1 1 1 1 2 2 2 2 2 levels:1 2 > class (GL (2,5)) #查看变 The class of quantities, shown as a factor, [1] "factor" > Mode (GL (2,5)) #查看数据大类, displayed as a numeric value, [1] "Numeric" > typeof (GL (2,5)) #查看数据细类, displayed as an integer [1] "integer" #来自: http://f.dataguru.cn/thread-99785-1-1.html from the fine degree, typeof>mode>class.



5, the basic knowledge of the Matrix and attention

#矩阵的基本知识
t ()       #转置
det ()     #行列式, matrix
x%*%y     #向量内积
x%o%y# vector outer product


A=array (1:9,dim=c (3,3))
a*a    #这个代表矩阵内两两子元素相乘
a%*%a #才是我们想要的结果
crossprod (a,a)    #等于t (a)%*%a
Crossprod (t (a), a) # equals a%*%a, so need T (A)


T-matrix transpose Rowsum line summation colsum column summation Rowmeans row average Colmeans column mean solve inverse of the linear equation solving or finding matrices 6, factor

# #因子 (≈ text + number combination)
#SPSS中值标签定义有异曲同工之妙
m=factor (1,0), Labels=c ("M", "F")); M  #能够转化因子格式 + defined value tag
m=as.factor (iris$setosa); M #上面的函数更有效, because As.factor can only be converted into factor format


7, input and output

Library load package data load set up dataset load load save or Save.image saved data read.table read table Read.csv read comma-separated table Read.delim read tab-Split table READ.FWF Read data to table save binary with fixed width formatted the specified object Save.image binary saves all objects within the current thread write.table writes the data in tabular form Write.csv writes the data as a CSV table After the text cat is coerced into the character, the output sink output turns to the specified file Print output screen format
8. Logical operation

!x logic non x & y logic with x && y logic (match only and return first value) x | Y-logic or X | | Y-Logic or (returns only the first value) x or (X,y) XOR or


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.