R Language Note 3: Extracts a subset of R objects, fuzzy matching and removing data box missing values

R Language Note 3: Extracts a subset of R objects, fuzzy matching and removing data box missing values _r

Last Update:2018-08-23 Source: Internet

Author: User

Tags true true

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Three basic methods for subsetting R Objects subset [:] The single bracket "Returns an object that is the same, such as a subset of vectors or a vector, and can also be used to select multiple elements in an object [[:" Both brackets "can only be used to extract one element, and to extract elements from a list or a data box. However, because the category of elements in a list or data box is not unique, it returns the type of the object that is not necessarily a list or a data box $: "Dollar sign" is an element in a named list or data box (one) vector subset

Example single bracket + Digital Index:

> x <-C ("A", "B", "C", "C", "D", "a")  
> X[2]            # # Extract the second element
[1] "B"
> X[1:4]          # # Extract contiguous multiple elements
[1] "a" "B" "C" "C"
> X[c (1, 3, 4)]   # # Extract discontinuous multiple elements
[1] "a" "C" "C"

2. Single bracket + logical index (sorted alphabetically):

> x <-C ("A", "B", "C", "C", "D", "a")  
> X[x > "a"]      # # Extract First-letter sort greater than A's element
[1] "B" "C" "C" "D"

3. Creating Logical vectors

> U <-x > "a"
> U
[1] FALSE true True True  false
> X[u]
[1 ] "B" "C" "C" "D"

(ii) subset of matrices

The subset of matrices can be completed by row and column indexes.

For example, the following 2*3 matrices

> x <-Matrix (1:6, 2, 3)
> x
     [, 1] [, 2] [, 3] [
1,]    1 3 5
[2,]    2    4 6
> X[1, 2]            
[1] 3
> x[, 1]
[1] 1 2
> X[1, 2, drop = FALSE]   # # Modify the drop parameter, return the matrix form
     [, 1]
[1,]    3
  > x[, 1, drop = FALSE]
     [, 1]
[1,]    1
[2,]    2

(iii) A subset of the list

A subset of the list can be used either "[", "[["], or "$"

> # Create a list containing two elements
> x <-list (foo = 1:4, bar = 0.6)
> x
$foo
[1] 1 2 3 4

$bar
[ 1] 0.6
> # Three methods to extract the first element
> X[1]            # # The single parenthesis Returns a list
$foo
[1] 1 2 3 4
> X[[1]]
[1 ] 1 2 3 4
> X$foo
[1] 1 2 3 4

The advantage of a list-taking subset method is that you don't need to memorize the order, as long as you have a name.

However, if you want to extract more than one element from the list, you can only extract individual elements using single brackets (both brackets and dollar signs), and you know the order.

> x <-list (foo = 1:4, bar = 0.6, Baz = "Hello")
> X[c (1, 3)]
$foo
[1] 1 2 3 4

$baz
[1] "Hello"

The difference between the two brackets and the dollar sign:

The dollar sign must be the element name that exists in the list;

The two brackets can be after the subsequent assignment variable

> x <-list (foo = 1:4, bar = 0.6, Baz = "Hello")
> Name <-"foo"
>
> # Computed index for " Foo "
> X[[name]]  
[1] 1 2 3 4
>
> # element ' name ' doesn ' t exist! (but no error here)
> X$name     
NULL
>
> # # element "foo" does exist
> X$foo      
[1] 1 2 3 4

In addition, both brackets can take an integer sequence, not just a number

> x <-List (a = List (a), B = C (3.14, 2.81))
> > # Get the 
3rd element of the 1st Element
  > x[[c (1, 3)]  
[1]
> 
> # Same as above
> X[[1]][[3]]   
[1]
> 
> # 1st element of the 2nd element
> X[[c (2, 1)]]  
[1] 3.14

(iv) Fuzzy matching

Dollar sign $ and both brackets [[have fuzzy matching functionality.] In this way, you can quickly find elements at the command line.

Cases:

> x <-List (aardvark = 1:5)
> x$a
[1] 1 2 3 4 5
> x[["a"]]                 # parameter defaults to exact match
NULL
; X[["a", exact = FALSE]]  # # parameter set to exact match
[1] 1 2 3 4 5

(v) Deletion of missing data and missing values (NA)

Most of the real data contains a lot of missing data, whether it's a vector, a matrix, or a data box, creating a subset to delete them by creating a logical vector that tells you where the missing values are.

The Is.na () function can find missing values in a vector, for example:

> x <-C (1, 2, NA, 4, NA, 5)
> Bad <-is.na (x)
> Print (Bad)
[1] false  TRUE  false TRUE FALSE
> X[!bad]         # #!bad represents a non-missing value
[1] 1 2 4 5

If you have more than one vector, multiple objects, and NA is distributed in different places to remove all missing values and create a new subset, you can use the complete.cases () function, as in the following example:

> x <-C (1, 2, NA, 4, NA, 5)
> y <-C ("A", "B", Na, "D", Na, "f")
> Good <-complete.cases (x, y)
> Good
[1]  True True  false true
> X[good]
[1] 1 2 4 5
> Y[good]
[1] "a" "B" "D" " F

Similarly, if the missing value position of the X,y two vectors is different, the complete.cases () function takes the set to determine the missing position:

> x <-C (1, 2, NA, 4, NA, 5)
> y <-C ("A", "B", Na, Na, "D", "F")
> Good <-complete.cases (x, Y)
> Good
[1] True to true false  true
> 
> X[good]
[1] 1 2 5< c30/>> Y[good]
[1] "a" "B" "F"

The

complete.cases () function can also be used to remove missing values from the data box

> # # Create a Data box airquality > Ozone <-C (+,%, NA, MB) > SOLAR.R <-C (190, 118, 149, 313, NA, NA  , 244, 222) > Wind <-C (7.4, 8.0, 12.1, 11.2, 14.3, 13.9, 14.1, 15.2) > Temp <-C (67, 72, 74, 62, 56, 66, 22, > Month <-C (5, 5, 5, 5, 5, 5, 5, 5) > Day <-C (1, 2, 3, 4, 5, 6, 7, 8) > Airquality <-data.frame (     Ozone, SOLAR.R, Wind, Temp, Month, day) > Airquality Ozone SOLAR.R Wind Temp Month Day 1 41 190 7.4 67      5 1 2 118 8.0 (5 2 3 149 12.1) 5 3 4 313 11.2 5 4 5 NA   Na 14.3 5 5 6 na 13.9 66 5 6 7 13 244 14.1 22 5 7 8 15 222 15.2      5 8 > > # # Top six row data in Data box > head (airquality) Ozone SOLAR.R Wind Temp Month Day 1 41 190 7.4 67    5 1 2 36 118 8.0 72 5 2 3 12 149 12.6 74 5 3 4 18 313 11.5 62 5 4-5  Na Na 14.3 56   5 5 6 NA 14.9 5 6 > > # to find rows without missing data by creating logical vectors > Good <-complete.cases (airquality) &G T      Head (Airquality[good,]) Ozone SOLAR.R wind Temp Month Day 1 41 190 7.4 67 5 1 2 36 118 8.0 72    5 2 3 12 149 12.6 74 5 3 4 18 313 11.5 62 5 4 7 23 299 8.6 65 5 7-8 19 99 13.8 59 5 8

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More