R Language Note 3: Extracts a subset of R objects, fuzzy matching and removing data box missing values _r

Source: Internet
Author: User
Tags true true
Three basic methods for subsetting R Objects subset [:] The single bracket "Returns an object that is the same, such as a subset of vectors or a vector, and can also be used to select multiple elements in an object [[:" Both brackets "can only be used to extract one element, and to extract elements from a list or a data box. However, because the category of elements in a list or data box is not unique, it returns the type of the object that is not necessarily a list or a data box $: "Dollar sign" is an element in a named list or data box (one) vector subset

Example single bracket + Digital Index:

> x <-C ("A", "B", "C", "C", "D", "a")  
> X[2]            # # Extract the second element
[1] "B"
> X[1:4]          # # Extract contiguous multiple elements
[1] "a" "B" "C" "C"
> X[c (1, 3, 4)]   # # Extract discontinuous multiple elements
[1] "a" "C" "C"

2. Single bracket + logical index (sorted alphabetically):

> x <-C ("A", "B", "C", "C", "D", "a")  
> X[x > "a"]      # # Extract First-letter sort greater than A's element
[1] "B" "C" "C" "D"

3. Creating Logical vectors

> U <-x > "a"
> U
[1] FALSE true True True  false
> X[u]
[1 ] "B" "C" "C" "D"
(ii) subset of matrices

The subset of matrices can be completed by row and column indexes.

For example, the following 2*3 matrices

> x <-Matrix (1:6, 2, 3)
> x
     [, 1] [, 2] [, 3] [
1,]    1 3 5
[2,]    2    4 6
> X[1, 2]            
[1] 3
> x[, 1]
[1] 1 2
> X[1, 2, drop = FALSE]   # # Modify the drop parameter, return the matrix form
     [, 1]
[1,]    3
  > x[, 1, drop = FALSE]
     [, 1]
[1,]    1
[2,]    2
(iii) A subset of the list

A subset of the list can be used either "[", "[["], or "$"

> # Create a list containing two elements
> x <-list (foo = 1:4, bar = 0.6)
> x
$foo
[1] 1 2 3 4

$bar
[ 1] 0.6
> # Three methods to extract the first element
> X[1]            # # The single parenthesis Returns a list
$foo
[1] 1 2 3 4
> X[[1]]
[1 ] 1 2 3 4
> X$foo
[1] 1 2 3 4

The advantage of a list-taking subset method is that you don't need to memorize the order, as long as you have a name.

However, if you want to extract more than one element from the list, you can only extract individual elements using single brackets (both brackets and dollar signs), and you know the order.

> x <-list (foo = 1:4, bar = 0.6, Baz = "Hello")
> X[c (1, 3)]
$foo
[1] 1 2 3 4

$baz
[1] "Hello"

The difference between the two brackets and the dollar sign:

The dollar sign must be the element name that exists in the list;

The two brackets can be after the subsequent assignment variable

> x <-list (foo = 1:4, bar = 0.6, Baz = "Hello")
> Name <-"foo"
>
> # Computed index for " Foo "
> X[[name]]  
[1] 1 2 3 4
>
> # element ' name ' doesn ' t exist! (but no error here)
> X$name     
NULL
>
> # # element "foo" does exist
> X$foo      
[1] 1 2 3 4

In addition, both brackets can take an integer sequence, not just a number

> x <-List (a = List (a), B = C (3.14, 2.81))
> > # Get the 
3rd element of the 1st Element
  > x[[c (1, 3)]  
[1]
> 
> # Same as above
> X[[1]][[3]]   
[1]
> 
> # 1st element of the 2nd element
> X[[c (2, 1)]]  
[1] 3.14
(iv) Fuzzy matching

Dollar sign $ and both brackets [[have fuzzy matching functionality.] In this way, you can quickly find elements at the command line.

Cases:

> x <-List (aardvark = 1:5)
> x$a
[1] 1 2 3 4 5
> x[["a"]]                 # parameter defaults to exact match
NULL
; X[["a", exact = FALSE]]  # # parameter set to exact match
[1] 1 2 3 4 5
(v) Deletion of missing data and missing values (NA)

Most of the real data contains a lot of missing data, whether it's a vector, a matrix, or a data box, creating a subset to delete them by creating a logical vector that tells you where the missing values are.

The Is.na () function can find missing values in a vector, for example:

> x <-C (1, 2, NA, 4, NA, 5)
> Bad <-is.na (x)
> Print (Bad)
[1] false  TRUE  false TRUE FALSE
> X[!bad]         # #!bad represents a non-missing value
[1] 1 2 4 5

If you have more than one vector, multiple objects, and NA is distributed in different places to remove all missing values and create a new subset, you can use the complete.cases () function, as in the following example:

> x <-C (1, 2, NA, 4, NA, 5)
> y <-C ("A", "B", Na, "D", Na, "f")
> Good <-complete.cases (x, y)
> Good
[1]  True True  false true
> X[good]
[1] 1 2 4 5
> Y[good]
[1] "a" "B" "D" " F

Similarly, if the missing value position of the X,y two vectors is different, the complete.cases () function takes the set to determine the missing position:

> x <-C (1, 2, NA, 4, NA, 5)
> y <-C ("A", "B", Na, Na, "D", "F")
> Good <-complete.cases (x, Y)
> Good
[1] True to true false  true
> 
> X[good]
[1] 1 2 5< c30/>> Y[good]
[1] "a" "B" "F"

The

complete.cases () function can also be used to remove missing values from the data box

> # # Create a Data box airquality > Ozone <-C (+,%, NA, MB) > SOLAR.R <-C (190, 118, 149, 313, NA, NA  , 244, 222) > Wind <-C (7.4, 8.0, 12.1, 11.2, 14.3, 13.9, 14.1, 15.2) > Temp <-C (67, 72, 74, 62, 56, 66, 22, > Month <-C (5, 5, 5, 5, 5, 5, 5, 5) > Day <-C (1, 2, 3, 4, 5, 6, 7, 8) > Airquality <-data.frame (     Ozone, SOLAR.R, Wind, Temp, Month, day) > Airquality Ozone SOLAR.R Wind Temp Month Day 1 41 190 7.4 67      5 1 2 118 8.0 (5 2 3 149 12.1) 5 3 4 313 11.2 5 4 5 NA   Na 14.3 5 5 6 na 13.9 66 5 6 7 13 244 14.1 22 5 7 8 15 222 15.2      5 8 > > # # Top six row data in Data box > head (airquality) Ozone SOLAR.R Wind Temp Month Day 1 41 190 7.4 67    5 1 2 36 118 8.0 72 5 2 3 12 149 12.6 74 5 3 4 18 313 11.5 62 5 4-5  Na Na 14.3 56   5 5 6 NA 14.9 5 6 > > # to find rows without missing data by creating logical vectors > Good <-complete.cases (airquality) &G T      Head (Airquality[good,]) Ozone SOLAR.R wind Temp Month Day 1 41 190 7.4 67 5 1 2 36 118 8.0 72    5 2 3 12 149 12.6 74 5 3 4 18 313 11.5 62 5 4 7 23 299 8.6 65 5 7-8 19 99 13.8 59 5 8

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.