R language Practical Reading note 2-creating datasets (top)

Source: Internet
Author: User
Tags scalar

Chapter II Creating datasets

2.1 Concepts of Datasets

Different industries have different names for the rows and columns of a dataset. Statisticians call them observations (observation) and variables (variable), which database analysts call records (record) and fields (field), Data mining/ Researchers in machine learning disciplines call them examples (example) and attributes (attribute). As shown in Table 2.1

In the data set shown in table 2-1, Patientid is the row/instance identifier, Admdate is a date variable, age is a continuous variable, diabetes is a nominal variable, and Status is an ordered variable.

The data types (patterns) that R can handle include numeric, character, logical (True/false), Complex (imaginary), and native (bytes). In R, Patientid, Admdate, and age will be numeric variables, while diabetes and status are character variables.

2.1 Data Structures

2.2.1 Scalar

A scalar is a vector that contains only one element, and they are used to hold constants.

2.2.2 Vector

Vectors are one-dimensional arrays for storing numeric, character, or logical data. The function C () that performs the combined function can be used to create vectors.

Here, A is a numeric vector, B is a character vector, and C is a logical vector.

Note that the data in a single vector must have the same type or pattern (numeric, character, or logical). Data in different schemas cannot be intermixed in the same vector.

2.2.3 Matrix

A matrix is a two-dimensional array, except that each element has the same pattern (numeric, character, or logical). Matrices can be created from the function matrix. The general use format is:

Where vector contains the elements of the Matrix, Nrow and Ncol are used to specify the number of rows and columns of the dimension, Dimnames contains an optional, character-type vector representation of the row name and column name. The option Byrow indicates whether the matrix should be populated by rows (Byrow=true) or by column (Byrow=false), by default, by column. The code in Listing 2-1 demonstrates the use of the matrix function.

2.2.3 Array

An array is similar to a matrix, but the dimension can be greater than 2. Arrays can be created from the array function in the following form:

Where the vector contains the data in the array, dimensions is a numeric vector that gives the maximum values for each dimension subscript, and Dimnames is an optional list of label names for each dimension. Code Listing 2-3 shows an example of creating a three-dimensional (2x3x4) numeric array.

2.2.4 Data Frame

Because different columns can contain data for different patterns (numeric, character, and so on), the concept of the data frame is more general than the matrix. The data frame will be the data structure you most often work with in R.

The data frame can be created by using the function Data.frame ():

Where the column vectors col1, col2, Col3,... Can be of any type, such as a character, numeric, or logical type. The name of each column can be specified by the function names.

2.2.5 Factor

Variables can be attributed to nominal , ordered , or continuous variables. Diabetes Type Diabetes (Type1, Type2) is an example of a nominal variant. ordered variables represent a sequential relationship, not a quantity relationship. The condition Status (poor, improved, excellent) is a good example of a sequential variant. A continuous variable can be rendered as any value within a range, and both the order and the quantity are represented. Age is a continuous variable that can represent values such as 14.5 or 22.8 and any other value in between.

Category (nominal) variables and ordered categories (ordered) variables are called factors (factor) in R.

The function factor () stores the class value as an integer vector, and the integer value range is [1 ... K] (where k is the number of unique values in a nominal variable), and an internal vector consisting of a string (the original value) is mapped to those integers.

To represent an ordered variable, you need to specify the parameter ordered=true for the function factor ().

2.2.6 List

A list is the most complex of the data types of R. Generally, a list is an ordered set of objects (or components, component). Lists allow you to integrate several (possibly unrelated) objects into a single object name. For example, a list might be a combination of several vectors, matrices, data frames, and even other lists.

The object can be any structure mentioned so far.

Supplemental Attach (), detach (), and with ()

Take the Mtcars dataset in R as an example

function attach () to add a data frame to the search path of R. R after encountering a variable name, the data frame in the search path is checked to locate the variable.

The function detach () removes the data frame from the search path. It is important to note that detach () does not do any processing on the data frame itself.

In addition, the other way is to use the function with (). In this case, the statements between the curly braces {} are executed against the data frame mtcars, so there is no need to worry about name collisions. If there is only one statement (for example, Summary (MPG)), then curly braces {} can be omitted. The limitation of the function with () is that the assignment takes effect only within the parentheses of this function.

If you need to create an object that exists outside the with () structure, you can use the special assignment <<-instead of the standard assignment (<-) to save the object to a global environment other than with ().

R language Practical Reading note 2-creating datasets (top)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.