Small personality of the R language

Source: Internet
Author: User

This article is used to document some of the minor problems that I have encountered in learning to use the R language, as well as some solutions that are different from other programming languages. Will continue to be recorded.

1. Division

The division operator of the R language is consistent with other common languages:/

> 8/5

[1] 1.6

But the remainder operator is: percent

> 8%%5

[1] 3

Division operation take integer number:%/%

> 8%/%5

[1] 1

Division Rounding: Round ()

Round () is followed by a parameter that indicates the retention to the number of the number (when a positive number refers to the retention of several decimals, which is a negative number when rounding to the first few)

> Round (8/5)

[1] 2

> Round (3.141592653,2)

[1] 3.14

> Round (3.141592653*100000,-2)

[1] 314200


2. The difference between list and data.frame

List and data.frame are two common formats for working with tabular data in R, plus the matrix.

First, the matrix, it must ensure that all data are of the same type.

> B <-matrix (C (1,1,1, 2,2,3, 1,3,4, 2,1,4), ncol=3, byrow=t)

> b

[, 1] [, 2] [, 3]

[1,] 1 1 1

[2,] 2 2 3

[3,] 1 3 4

[4,] 2 1 4

> A <-Matrix (c (the "Wo", 2,2,3, 1,3,4, 2,1,4), ncol=3, byrow=t)

> A

[, 1] [, 2] [, 3]

[1,] "1" "1" "Wo"

[2,] "2" "2" "3"

[3,] "1" "3" "4"

[4,] "2" "1" "4"

> Mode (a)

[1] "character"

> mode (b)

[1] "Numeric"

You can see that the difference between A and B is that there is a character type of data "Wo" in a, but when printed out, other numeric type data is also converted to the character type.

Now look at the differences between list and data.frame, and they can all contain different types of data but there are some differences.

Difference 1: Some data are viewed and displayed in different ways. The list shows the data by column, and Data.frame is displayed by row.

> List <-list (a=c ("Hai", "Tian", "Xiang", "Jie", "de"), B=c ("Di", "Fang", "Jiu", "Shi", "wo"), C=c ("Qian"

, "Gua", "de", "gu", "Xiang"))

> List

$a

[1] "Hai" "tian" "Xiang" "Jie" "de"

$b

[1] "Di" "Fang" "Jiu" "Shi" "Wo"

$c

[1] "Qian" "Gua" "de" "gu" "Xiang"

> Dataframe

A b C

1 Hai di qian

2 Tianfang Gua

3 Xiang Jiu de

4 Jie Shi gu

5 de wo Xiang

> Head (list,n=1)

$a

[1] "Hai" "tian" "Xiang" "Jie" "de"

> Head (dataframe,n=1)

A b C

1 Hai di qian


Difference 2: View the column name, for list should be to see the row name (I say) is with names (), for Dataframe is to see the column name with Colnames (), it also has to view the row name Rownames (), when not defined, the default is 1, 2,3,4 ... Sequence.

> Names (list)

[1] "a" "B" "C"

> colnames (dataframe)

[1] "a" "B" "C"

> Rownames (dataframe)

[1] "1" "2" "3" "4" "5"

Difference 3:The list can contain different lengths of data, Dataframe must contain the same length data for each column, and the As.data.frame () method can be used to convert to the Data.frame type when the list has the same length of data per row.

> List2 <-list (a=1:5,b=1:4)

> List2

$a

[1] 1 2 3 4 5

$b

[1] 1 2 3 4

> dataframe2<-as.data.frame (list2)

Error in data.frame (a = 1:5, b =1:4, check.names = True, Stringsasfactors = True):

parameter values mean different number of rows : 5, 4

> List2 <-list (a=1:5,b=6:10)

> List2

$a

[1] 1 2 3 4 5

$b

[1] 6 7 8 9 10

> dataframe2<-as.data.frame (list2)

> dataframe2

A b

1 1 6

2 2 7

3 3 8

4 4 9

5 5 10

Difference 4: data is referenced differently. You can use the $ reference symbol, but there are differences in [] references and [[]] references.

> list$a

[1] "Hai" "tian" "Xiang" "Jie" "de"

> dataframe$a

[1] Hai Tian Xiang Jie de

Levels:de hai Jie Tian Xiang

> List[1]

$a

[1] "Hai" "tian" "Xiang" "Jie" "de"

> Dataframe[1]

A

1 hai

2 Tian

3 Xiang

4 Jie

5 de

> List[[1]]

[1] "Hai" "tian" "Xiang" "Jie" "de"

> Dataframe[[1]]

[1] Hai Tian Xiang Jie de

Levels:de hai Jie Tian Xiang

> List[[2]][1]

[1] "Di"

> Dataframe[[2]][1]

[1] di

Levels:di Fang Jiu Shi wo

> list[2,1]

Error in List[2, 1]: Incorrectnumber of dimensions

> dataframe[2,1]

[1] Tian

Levels:de hai Jie Tian Xiang

The difference 5:data.frame there is a factor factor, in the difference four, when you view a column of Dataframe or an item of data, the data below will have levels content, this is the factor of this column. Corresponds to the range of values for this column, and what unique values are available. The following will talk about the origin and role of the factor factor, here is not elaborate.

3. Delete data for a row or column.

Applies to both list and dataframe. Delete Rows can refer to this row directly and assign null, or use the-manipulate symbol, as shown in the implementation process.

> list$a <-null

> List

$b

[1] "Di" "Fang" "Jiu" "Shi" "Wo"

$c

[1] "Qian" "Gua" "de" "gu" "Xiang"

> List[-1]

$b

[1] "Di" "Fang" "Jiu" "Shi" "Wo"

$c

[1] "Qian" "Gua" "de" "gu" "Xiang"

> list$a

[1] "Hai" "tian" "Xiang" "Jie" "de"

> List$a[-1]

[1] "Tian" "Xiang" "Jie" "de"

> List[-1,]

Error in List[-1,]: Incorrectnumber of dimensions

> List[,-1]

Error in list[,-1]: Incorrectnumber of dimensions

> dataframe$a<-NULL

> Dataframe

b C

1 di Qian

2 Fang Gua

3 Jiu de

4 Shi Gu

5 Woxiang

> dataframe<-as.data.frame (list)

> Dataframe[-1,]

A b C

2 Tianfang Gua

3 Xiang Jiu de

4 Jie Shi gu

5 de wo Xiang

> Dataframe <-as.data.frame (list)

> Dataframe[,-1]

b C

1 di Qian

2 Fang Gua

3 Jiu de

4 Shi Gu

5 Woxiang

> dataframe$b

[1] di Fang jiu Shi wo

Levels:di Fang Jiu Shi wo

> Dataframe$b[-2]

[1] di jiu shi wo

Levels:di Fang Jiu Shi wo


The matrix can also use similar operations to delete data from a column in a row, and to delete more than one row or multiple columns at a time.

> dataframe$c

[1] Qian gua de gu xiang

Levels:de gu Gua Qian xiang

> Dataframe$c[c ( -1,-3,-5)]

[1] Gua gu

Levels:de gu Gua Qian xiang

4. Read the data.

Mainly to talk about some small details of the read.table () and Read.csv () methods. The data is stored as a data.frame type after it is read in.

Code : Read.csv () The default reading format for Chinese is GBK format and cannot be set. If you read the file in Chinese encoding format is UTF-8 format, the use of read.csv will appear garbled. However, the Read.table () method reads the UTF-8 format by default and contains the encoding parameter, which allows you to set the encoding format for reading data.

Header : read.csv () The default is that the Header=t,read.table (), which contains the header, has no header header=f.

factor : Read.csv () and Read.table () two methods have stringsasfactors parameters, which by default is true. If you do not set it, the data for each column character is stored as a factor when the data is read. As the following example, the fruit column is converted into a factor, and the data is converted to 1,1,3,4,2, and the values 1-4 correspond to levels in order: Apple grape banana grapefruit. But when we look at this column of data, we're still displaying character data.

> Test

Fruitprice

1 Apple 5.98

2 Apple 3.50

3 Banana 4.50

4 Grapefruit 4.80

5 Grape 8.70

> Test$fruit

[1] Apple Apple banana grapefruit Grape

levels: Apple Grape Banana Grapefruit

read data from clipboard : See the previous column, we can read the data directly from the Clipboard, first in Excel to select the data area to read, right-click Copy, and then execute the read.table ("clipboard") statement.

> Test <-read.table ("clipboard", header=t)

> Test

Fruitprice

1 Apple 5.98

2 Apple 3.50

3 Banana 4.50

4 Grapefruit 4.80

5 Grape 8.70

5. Methods for merging two tables

Here are two tables similar to the method of join in MySQL--merge (), which by default intersects with the same column name as two tables. The introduction of the method is shown in the link: http://my.oschina.net/u/1791586/blog/337054, there is a very detailed description of the method. To illustrate is the parameter all/x.all/y.all, these three parameters value t/f, used to define whether to take two data frame x or y of all columns. The effect is similar to join's full connection, left connection, right connection. No other example is given, and there are good examples in the reference link.

6. View the data.

View () can see the list, vector, dataframe data, but in Rstudio, the view () is garbled in Chinese. However, the Mac and Linux platform does not exist on this issue, only with the Windows platform, as if there is no way to see what settings rstudio to avoid the Chinese garbled. There is no garbled problem in R.

Fix () can also view list, vector, dataframe data.

The difference is that the fix () method can view the contents of a list with different number of fields contained in the column. View () can only look at neat data, that is, the number of rows is the same data. Another fix () method is to view the data, open the data edit box, and modify the data in it. The view () method simply looks at the data.

> Test <-list (A=c ("A", "B", "C", "D"), B=1:4)

> View (Test)

> Fix (Test)

> Fix (dataframe)


7. Data type for R language view mode ()/class ()/typeof ()

Did not find a very detailed explanation, only according to my own general understanding, may be wrong, please correct me. These three functions are functions that can view the data type. But there are some small details of the difference.

In the R language, all data, objects, methods, and statements can be viewed in mode (), and the main mode types are: Complex, raw, character, list, expression, name, symbol, function, Mode can be said to be of a large type.

All objects have typeof attributes and class attributes, but typeof is more granular than class.

>x <-C (1,2,3,4,5)

>mode (x)

[1] "Numeric"

> class (X)

[1] "Numeric"

> typeof (X)

[1] "double"

Reprint please indicate the source, thank you!

Small personality of the R language

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.