International - English

Cart Console

Topic Center

Contact Sales

Home > Others

The creation data set of "R Language Combat" (chapter II, various data structures)

Last Update:2017-11-26 Source: Internet

Author: User

Tags scalar

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Data Set 2.1 DataSet concepts

Concept: A rectangular data that is usually composed of data

Different industries have different names for the rows and columns of a dataset

Industry people	Yes	Column
Statistical biologist	Observations (observation)	Variable (variable)
Database Analyst	Records (record)	Fields (field)
In the research of data mining and machine learning	Example (example)	Properties (Attribute)

Data types (patterns) that can be processed: numeric, character, logical, complex, primitive (bytes)

Structure of stored data: scalar, vector, data, data frame, and list

Identifier of the instance: Rownames (row name); class type of instance: Factor (factors)

2.2 Data structures

This section describes a few data structures, vectors, matrices, arrays, data frames, the first three kinds are one-dimensional, two-dimensional, more than two-dimensional, they are in a data structure, can only use one data mode, and data frame can be a variety of patterns.

Some definitions

Object: Anything that can be copied to a variable, including constants, data structures, functions, graphics

Patterns: Describes how objects are stored and a variety of

Data frame: A structure that stores data (columns represent variables, rows represent observations), and a data frame can store variables of different types (such as numeric, character)

2.2.1 Vector (one-dimensional data, numeric, character, logical)

a<-C (All-in-one) #数值型b <-c ("One", "one", "three") #字符型c <-c (true,ture,false) #逻辑型

Note:1. The character vector, the element to add "" or ", numerical and logical type is not required.

2. The same vector, only one pattern of data;

3. Scalar is a vector that contains only one element

#标量是 vector with only one element f<-1g<-"US" h<-true

Square brackets: The position value of the element, specifically how to access the elements in the vector, see the following code

> a<-c ("K", "J", "H", "A", "C", "M") #生成一个向量 > A[3]  #向量a的第三个元素 [1] "H" > A[c (1,3,5)] #向量a的第1个, 3rd, 5th element [1] "K "H" "C" > A[2:6]  #生成一个数值序列, the element from 2nd to 6th of vector A. Equivalent to a ([1] "J" "H" "a" "C" "M" #两种方式生成的向量a一样 > A<-c (2:6) > A[1] 2 3 4 5 6> a<-c (2,3,4,5,6) > A[1] 2 3 4 5 6

2.2.2 Matrices (two-dimensional numerical, character, numeric, logical)

Note: Only one data type can be included in a matrix

function Matrix ()

Role: Create a matrix

Format: Myymatrix <-matrix (vector, nrow=number_of_rows, Ncol=number_of_columns, Byrow=logical_value, Dimnames=list ( Char_vector_rownames, Char_vector_colnames))

wherein, the elements of the vector--matrix, Nrow, ncol--respectively the number of rows and columns of the dimension, dimnames--optional, a character vector representation of the row and column names; byrow--Matrix row-row padding (Byrow = TRUE) or column-filled (byrow = FALSE), by default, by column.

Matrix usage Examples

Eg1. Creates a matrix with an element of size 5*4 1 through 20, arranged by column by default.

> Y<-matrix (1:20,nrow=5,ncol=4) > Y     [, 1] [, 2] [, 3] [, 4][1,]    1    6   16[2,]    2    7   17[3,]    3    8   18[4,]    4    9   19[5,]    5   15   20

EG2.

> Cells <-C (1,26,24,68) > Rnames<-c ("R1", "R2") > Cnames<-c ("C1", "C2") #按列排列 (also default) > mymatrix< -matrix (Cells,nrow=2,ncol=2,byrow=false,dimnames=list (rnames,cnames)) > Mymatrix   C1 c2r1  1 24R2 26 68# Arrange By row > Mymatrix<-matrix (cells,nrow=2,ncol=2,byrow=true,dimnames=list (rnames,cnames)) > Mymatrix   C1 C2R1  1 26R2 24 68

Select the elements in the matrix :

X[i,]: line I in the Matrix; X[,j]: Column J in the Matrix; X[i,j]: Line I is the J column element

Select multiple rows or columns, subscript I and J can be numeric vectors

Example:

> X<-matrix (1:10,nrow=2) > x     [, 1] [, 2] [, 3] [, 4] [, 5][1,]    1    3    5    7    9[2,]    2    4    6    8   10> x[2,][1]  2  4  6  8 10> x[,2][1] 3 4> x[1,4]  #第1行的第4各个元素 [1] 7 > X[1,c (4,5)] #第1行的, 4th element and 5th element [1] 7 9

2.2.3 Array (dimension can be greater than 2)

Note: Data in an array can only have one pattern

How to create: Array ()

Myaaray <-Array (vector, dimensions, dimnames)

Among them, vector-the data in the array, dimensions-numeric vector, gives the maximum value of each dimension; dimnames--the list of optional, dimension name labels .

eg. creating a three-dimensional (2*3*4) numeric array

> dim1<-c ("A1", "A2") > Dim2<-c ("B1", "B2") > Dim2<-c ("B1", "B2", "B3") > Dim3<-c ("C1", "C2", "C3", "C4") > Z<-array (1:24,c (2,3,4), List (dim1,dim2,dim3)) > z, C1   B1 B2 b3a1  1  3  5a2  2  4  6,, C2   B1 B2 b3a1  7  9 11a2  8, C3 B1 B2 b3a1 (17A2), C4 B1 B2   b3a1 20 23a2 22 24> Z<-array (1:24,c (2,3,4), Dimnames=list (dim1,dim2,dim3)) > z, C1   B1 B2 b3a1  1  3  5a2  2  4  6, C2   B1 B2 b3a1  7  9 11a2  8,, C3 B1   B2 b3a1 (17A2), C 4   B1 B2 b3a1 23a2 20 22 24

The element is selected in a similar way to live, for example: z[1,2,3] is 15.

2.2.4 Data frame (can contain different modes(Numeric type, character typewait) (The data)

Note: You can put data in multiple modes into a matrix, but the data pattern for each column must be unique, and the different column patterns can be different

Create function Data.frame ()

MyData <-data.frame (col1, col2, col3)

Where the column vector col1,col2,col3 can be any type (such as character, numeric, or logical)

> patientid<-c (1,2,3,4) > Age<-c (25,34,28,52) > Diabetes<-c ("Type1", "Type2", "Type1", "Type2") > Status<-c ("Poor", "improved", "excellent", "Poor") > Patientdata<-data.frame (patientid,age,diabetes,status ) > Patientdata  patientid Age Diabetes    status1         1    Type1      Poor2         2    Type2  Improved3         3    Type1 Excellent4         4    Type2      Poor

Select the element in the data frame : 1. Use the marker number. 2. Specify the list directly. 3.$: Select a specific variable for a given data frame

> Patientdata[1:2]  patientid age1         1  252         2  343         3  284         4  52> patientdata[c (1:3)]  Patientid Age diabetes1         1    Type12         2    Type23         3    Type14         4  the    Type2> patientdata[c ("Diabetes", "status")]  diabetes    status1 Type1 Poor2    Type2  Improved3    Type1 Excellent4    Type2      poor> Patientdata$age[1] 25 34 28 52

To generate a diabetes and Status column table

        Excellent improved Poor  Type1         1        0    1  Type2         0        1    1

A simpler way to call variables than data frame $ variable names is with attach () and detach (), and with

1.attach (), detach () and with ()

Attach () can add a data frame to R's search path, and with it, when calling a variable in a data frame, you don't need to tell r what data frame the variable is now calling.

Detach () removes the data frame from the search path.

Attach () and detach () are like a pair of brothers, but in fact, detach does not work on the data itself and can be omitted.

Summary (MTCARS$DISP) plot (mtcars$mpg,mtcars$disp) plot (MTCARS$MPG,MTCARS$WT)

Equivalent to

Attach (Mtcars) summary (MPG) plot (mpg,disp) plot (MPG,WT) detach (Mtcars)

limitations: When there are more than one object with the same name, a problem occurs with attach (). The original object will take precedence, and later objects will be masked (masked).

With () How to get the same result as the above code, see the following code

With (mtcars,{  print (Summary (MPG))  plot (Mpg,disp)  plot (MPG,WT)})
With (Mtcars,{print (summary (MPG))
Plot (Mpg,disp)
Plot (MPG,WT)})

Note that there is no comma in the curly brackets, to break the line, I run my own, no line break can not be achieved. Curly brace statements are for the data frame mtcars, if there is only one statement in the curly braces, the curly braces can be omitted

limitation: assignment is left only within the parentheses of this function.

Improvement: instead of <-with special Replicator <<-, you can save the object to a global environment outside of the.

> With (mtcars,{nokeepstates<-summary (MPG) + keepstates<<-summary (MPG)}) > Nookeepstateserror:object ' Nookeepstates ' not found> keepstates   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.   10.40   15.43   19.20   20.09   22.80

The result is self-evident, because Keepstates is saved to a global environment other than with (), and nookeepstates does not, so when left with (), only keepstates exists.

2. Strength identifiers

By row.names= A variable to specify an instance identifier, my understanding is that we have a role in the school number, the job number.

Patientdata<-data.frame (Patientid,age,diabetes,status,row.names=patientid) Specifying Patientid as the variable used to mark various printouts and instance names in the graph in R (this is the exact words of the book), I understand that Patientid is the only variable in the data frame that can identify the identity. Each instance, or observation, is unique.

2.2.5 Factor ( determine how data is analyzed and how it is visually presented）

Variable
Nominal variable	Categorical variables with no order	Factor
Ordered variables	Have a sequential relationship, no quantity relationship	Factor
Continuous type variable	There are also sequences and variables	----

Function: Factor ()

Function: Stores the class value as an integer vector, starting with 1, and mapping an internal vector consisting of a string (the original value) to those integers.

Convert raw values to numeric variables

Nominal variable---> stored as an integer vector
Disabetes <-C ("Type1", "Type2", "Typye1", "Type1")

> diabetes<-factor (Diabetes) #将向量diabetes存储为 (1,2,1,1) > Diabetes      #关联关系为1 =type1,2=type2 ( assignment based on alphabetical order ) [1] Type1 Type2 Type1 type2levels:type1 type2> str (diabetes) Factor W/2 levels "Type1", "Type2": 1 2 1 2

Note: Any analysis of diabetes will be used as a nominal vector pair and automatically select a statistical method suitable for this measurement scale.

ordered variable----> stored as an integer vector (in factor () function Chinese Medicine add parameter ordered=true)

> status<- C ("Poor", "improved", "excellent", "Poor") > Status[1] "Poor" "Improved" "excellent" "Poor" > Status<-facto R (status,ordered=true,levels = C ("Poor", "improved", "excellent")) # levels overrides the default sort > str (status) ord.factor W/3 levels "Poor" < "improved" <.: 1 2 3 1

Note: Any analysis for this variable is treated as an ordered variable and automatically selects the appropriate statistical method

numeric variables (parameters levels and labels required)
Suppose men encode 1, women encode 2.
```
> sex<-c (1,1,2) > sex<-factor (sex,levels = C (UP), labels = c ("Male", "Female")) > str (sex) factor W/2 levels "Male", "Female": 1 1 2
```
Note: The order of the labels labels = c ("Male", "Female") and horizontally consistent levels = C (1, 2)
The label "Male" and "Female" will replace 1 and 2 in the result type output, instead of 1 or 2 of the gender variables will be treated as missing values.
```
> Sex<-c > Sex<-factor (sex,levels = C (UP), labels = c ("Male", "Female")) > str (sex) factor W/2 levels "Male", "Female": 1 2 NA
```

See below how common and ordered factors affect data analysis

#以向量形式输入
Patientid <-C (1, 2, 3, 4) Age <-C (,,) diabetes <-C ("Type1", "Type2", "Type1", "Type1") status <-C ("Poor", "improved", "excellent", "Poor") #将diabetes指定为普通因子
Diabetes <-Factor (diabetes)
#将status指定为有序型因子status <-factor (status, Order=true)
#将数据合并为数据框patientdata <-Data.frame (Patientid, age, diabetes, status)
#str (object) Displays the result of an object, providing information about an object in R (this example is a data frame) str (patientdata)
$summary () treats each variable differently, showing the statistical summary of the object                              Summary (patientdata)

After running

> str (patientdata) ' Data.frame ': 4 obs. of  4 variables: $ patientid:num  1 2 3 4 $ age      : Num  25 34 28 52 $ diabetes:factor W/2 Levels "Type1", "Type2": 1 2 1 2 $ status   : Factor W/3 Levels "excellent", "Improved",..: 3 2 1 3> Summary (patientdata)   Patientid         age         Diabetes       status  Min.   : 1.00   Min.   : 25.00   type1:2   excellent:1   1st qu.:1.75   1st qu.:27.25   type2:2   improved:1   median:2.50   median:31.00             Poor     : 2   Mean   : 2.50   Mean   : 34.75                           3rd qu.:3.25   3rd qu.:38.50                           Max.   : 4.00   Max.   : 52.00

After running Str (), it is clearly shown that diabetes is a factor, status is an ordered factor, and how the data frame is encoded internally

After running summary (), the individual variable differences are treated to show the minimum, maximum, mean, and four-digit digits of the continuous variable age. The two factors, diabetes and status (each level), show the frequency value.

2.2.6 List

Definition: A collection of objects (or components). Allows the consolidation of several (possibly unrelated) objects to a single object name.

Therefore, an object can be a combination of several vectors, matrices, data frames, and even other lists.

Functions for creating lists: List ()

MyList <-list (Object1, Object2,...)

Name of the object in the list: mylist<-list (name1 = Object1,name2 = object2)

g<-"My first List" #字符串h <-c (25,26,18,39) #数值型向量j <-matrix (1:10,nrow=5) #5 The Matrix k<-c ("One", "one", "one", "three") # Character Vector mylist<-list (title=g,h,j,k) #创建列表, where the first object is named title

> Mylist$title[1] "My first List" [[2]][1] [39[[3]]     [, 1] [, 2][1,]    1    6[2,]    2    7[3,]    3< C6/>8[4,]    4    9[5,]    5   10[[4]][1] "one"   "one" "the" "   three"

Accessing elements in a list 1. The number of a component can be lethal with a double brace. 2. By name

> Mylist[[1]][1] "My first list" > mylist[["title"]][1] "My first list" > Mylist$title #要命名了才可以 [1] "My first list"

These are the basic data structures.

The creation data set of "R Language Combat" (chapter II, various data structures)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

list of data structures fundamentals of data structures list of all data structures handbook of data structures and applications python data structures coursera php data structures popular data structures

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The creation data set of "R Language Combat" (chapter II, various data structures)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support