R Data Type

R Data Type _r

Last Update:2018-08-22 Source: Internet

Author: User

Tags numeric value one table types of functions

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

R data Type R data storage type base type

The most basic type is the type that stores a single numeric value. Mainly including Numeric, Integer, Complex, Character, Logical and so on. Digital

Numeric or "Double" is the way in which values are stored by R, equivalent to "double" in C. It should be noted that sometimes the Numeric is considered "integer" and "double" collectively. . Machine$double.double.eps variables give the limit of the storage double in the environment.

The integer is an integer, equivalent to "int" in C. Generally, regardless of the number of decimal points, R defaults to Numeric, this time need to use the As.integer function force to save the number as an integer. . Machine$integer.max gives the largest number of integers that can be stored, always 2^31-1 = 2147483647.

Complex is the stored form of a complex number. Character

Character is the type that stores characters and strings. Both strings and characters can be stored. Logic

Logical is the type that stores the bool value, with only TRUE (T) and FALSE (F) two values. Time

The date type is designed specifically for storage time. Posixct saves time as an integer for a time distance of January 1, 1970. Posixlt saves a list of information about the minutes and seconds of the month. Use Unclass to convert the corresponding class to the base class. The corresponding function has as. Posixct, as. Posixlt, Strptime, Strftime, isodate, isodatetime, etc., and Chron packs can handle time. Data

Data is often not a single value, and R has a good structure for storing multivalued data. Vector

Vector, vectors, similar to one-dimensional array, store the same basic type, if there are character elements, then all values will be converted to character type. It should be noted that a length of 1 can also be a vector, so using the Is.vector function to judge also shows TRUE, meaning that all single basic element variables are considered to be vectors. For vectors of different lengths, using functions such as class or mode to determine their type, the type of the basic element is given, and if all elements of a vector are character, return character.

The strategy in this way is that R language tries to ignore the difference between a single plural. In fact, the single data and the type of vector do not make too much distinction is the characteristics of R language response. The R language is used to process large amounts of data, and its structure and logic are more in line with this requirement. For example, if you add 1 to an integer vector with an actual length of not 1, you get a new vector with each value plus 1 for the original vector.

X <-1 6
x + 1
[1] 2 3 4 5 6, 7

If two equal-length vectors are added, then the corresponding elements are added together.

X <-1:6
y <-1:6
x + y
[1]  2  4 6 8 10  12

If two vectors are unequal in length, then only one length is a multiple of another length to add. The short vector repeats as long as the length and the long vector. In fact, it is understood that when the short vector length is 1, the result is actually a special case. This is also a feature of the R language that lightens the distinction between a single value and a vector.

X <-1:6
z <-1:2
x + z
[1] 2 4 4 6 6 8

Factor

Factor is a way to help save memory space. If you have more duplicate values in a series of values, you can use factor to store only one original value in factor, and the original value will be saved as a number, which saves space.

The original value is called level, you can set the order by yourself. The Levels function can return all possible level of a factor, and Nlevels can return the number of level.

Convert a vector to factor just use the As.factor function. Sometimes we need to convert the number first to factor, after a certain processing need to convert factor to a number, this time can not directly use As.numeric, because As.numeric will directly return the factor internal value, rather than the original value. We need to use As.character or levels first to get the character and then convert it to a number.

Myfactor <-Factor (c, a, M, Levels=c (10,20,50), ordered=true)
as.numeric (Levels (Myfactor) [ Myfactor])
as.numeric (As.character (Myfactor))

Sometimes we need to generate some kind of factor to do parameters or test data, so we can use the GL function. The GL function can be written as an abbreviation for "Generate levels". The parameters of GL function mainly include: n is used to set the number of level; K is used to set the number of repetitions per level; Length to set lengths, in fact, with the first two parameters this can be ignored; Labels is used to set the value of the level; Ordered accepts the bool value and sets whether the level is ordered.

It should be noted that in the use of C function to combine a number of factors, you need to first convert the factor to the original value and then use the C function, otherwise the C function directly to the existence of the factor as a number in memory lost the original meaning.

If I have a vector, which is a continuous value, I now want to draw a histogram, I can directly use the corresponding function to draw. I can use the cut function if I don't need to see the graph and just want to know how the values are distributed within the range. The cut function divides the values into different intervals, and then converts the original vectors into a factor with a level interval, so that you know which interval a value belongs to. The length of the original vector and the factor is equal, and the level of the factor is set by itself. You can use the table function to count the number of values in each interval.

AAA <-C (1,2,3,4,5,2,3,4,5,6,7) cut

(AAA, 3)
[1] (0.994,3] (0.994,3) (0.994,3] (3,5)     (3,5)     (0.994,3 ] (0.994,3]
[8] (3,5] (     3,5) (     5,7.01]  (5,7.01) 
levels: (0.994,3] (3,5) (5,7.01) Cut

(AAA, 3, Dig.lab = 4, ordered = TRUE)
[1] (0.994,3] (0.994,3) (0.994,3) (3,5] (3,5) (0.994,3)     (0.994,3)
[8] (3 , 5]     (3,5]     (5,7.006) (5,7.006]
levels: (0.994,3] < (3,5) < (5,7.006)

Sometimes, I need to understand the number of combinations between two factors, this time you can select the interaction function, the interaction function can give a combination of multiple factor level. These combinations do not all have data, and if you set drop = TRUE, you throw away the level without the data, leaving only the true level of data.

A <-GL (2, 4, 8)
B <-GL (2, 2, 8, labels = c ("Ctrl", "Treat"))
interaction (A, B, drop = TRUE, Sep = ".")
[1] 1.ctrl  1.ctrl  1.treat 1.treat 2.ctrl  2.ctrl  2.treat 2.treat
levels:1.ctrl 2.ctrl 1.treat 2.treat

Matrix

Matrix, two-dimensional array, all elements are of the same type. As.matrix, Is.matrix.

When you take elements from the matrix, you can use subscripts to manipulate them. In general, the subscript follows [row, COL], where row and COL can be vectors, either vectors that indicate the line number or column number that you want to take out, or a vector of bool values. If there is no comma in the square brackets, and according to [NUM] to take the elements of the matrix, it will return the matrix as a one-dimensional vector after the corresponding position of the value, if it is 2 times 2 matrix, [3] will return [2,1] value, the original matrix will be extended to the vector according to the column precedence.

When a row or column of a matrix is taken, the dimension of its return value is reduced, and when the object is removed, the parameter [, drop = FALSE] is not allowed to reduce the dimension of the value of the result of the matrix.

In memory, a matrix is a one-dimensional vector that is based on row or column precedence, so if you need to build a matrix, it's best to create a matrix that is large enough, and then fill in the number, instead of building a small matrix, and then use Rbind or Cbind to add it. Because if the number of matrices increases, R needs to reapply the space, and if the added rows or columns and storage priorities are not the same, then the matrix elements need to be sorted again, which makes the efficiency very low. So, building a matrix should be a large enough matrix, and if you don't need such a large matrix at the end, just reassign it once. Array

Array, you can have many dimensions. List

List, lists, you can combine different types of variables together, the list can also contain a child list.

Taking an element in a list requires special attention, and if you use a single parenthesis "[]", the result is a child list of the list, and if you want to get its own content, you need to use both brackets "[[]]" or dollar sign "$". You can use a name or a number to take an element.

MyList <-list (one = "one", two = C (2, 2))
mylist
$one
[1] "one"

$two
[1] 2 2

mylist["one"]
   $one
[1] "one"

mylist[["one"]]
[1] "one"
mylist$one
[1] "one"

Because there are various types of objects available in the list, this facilitates the integration of a wide variety of related data. A list can be viewed inaccurately as a class that is specifically designed to store data in the C + + language. Some related variables, in order to distinguish with other variables, often take a similar variable name, when the variable is very much, such a method is still not convenient to query. We can put these related variables in a list and then access the variables by removing the method of the object. If you forget the variable name, you can also use the names function or the STR function to query the name of the variable contained in the list. Such a method is ideal for saving the same or similar processing results for multiple datasets, and you can use a for loop to save data.

RNA.GENE.FPKM  <-list ()  # requires an early set of empty lists.
For (Nam in Dir (Rna.cuffnorm.result.dir)) {
    Rna.gene.fpkm[[nam]]  <-read.table (  # Create list elements and assign values
        File.path ("./",
                  nam,
                  "genes.fpkm_table"),
        header = TRUE,
        Sep    = "T")
}

The list is stored in memory is not continuous, but as the C language in the same spread as the list, so the addition of elements of the listing is not as efficient as the matrix, for the list or the corresponding data box, the use of rbind is not as slow as the matrix use. That's why we can create an empty list (or a data box) first, and then incrementally add elements to it in such a way as for loop. Data box

Data.frame, a data box, is a special list, which, like the matrix, restricts the variable lengths of each column to be the same, but also, like the list, the variable types of each column can be different. The data box actually looks very much like the one table in Excel that you use, and the column and row names of the data boxes correspond to the column and row names in the Excel table, respectively. Because the data box has the list and the matrix characteristic, therefore carries on the data box to take the element to have the list and the matrix to take the element the characteristic, we can either use "$" to take a column like the list, also may use "[ROW, COL]" as the Matrix to take its A specific value in the. Type of query variable

Commonly used query variable types of functions are: Mode, Storage.mode, class and typeof. There are some differences in these functions. Storage.mode is the way in which data is actually stored in memory. Class is an object-oriented R, such as data.frame the actual storage mode (Storage.mode) is a list, but in order to better handle the form data, the wrapper becomes the Data.frame type. The results given by mode and typeof are very close to the actual type, but in mode, "integer" and "Double" are considered "numeric".

If you need to know the size of each basic type in your environment that can be stored, you can query. Machine this list, the corresponding size is stored in the list. NA

For a variety of reasons, where the missing value may occur in the data, R replaces the corresponding null value with NA. If the null value is computed, it is replaced with an INF or NaN. Processing null values is an essential part of data analysis. To judge NA can use Is.na or Is.nan function, where is.na will consider Na, Inf, Nan as Na, while Is.nan only focus on Nan. Mean, Var, sum, Min, Max, and other functions all have na.rm parameters, which are set to true, and then the NA is removed at the time of calculation. LM, GLM, GAM and other functions have na.action parameters that accept functions as variables, such as Na.omit, Na.fail. Na.pass, Na.exculde and so on. Both Na.omit and complete.cases can return a data.frame that contains only complete rows of data, which means that if one or more Na is in a row, the row is excluded. For functions such as read.table, you can use na.strings to think of a particular value or character as Na.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More