R language Missing value processing

Last Update:2018-07-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Missing Value 1. Is.na true Value position judgment

Note : Missing values are considered to be non-comparable, even if compared to the missing values themselves. This means that the comparison operation cannot be used
Character to detect if a missing value exists. For example, the result of a logical test MyVar = = Na will never be true. As
Instead, you can only use functions that deal with missing values (as described in this section) to identify gaps in R data Objects
Values are lost. 2. Na.omit () Delete incomplete observations

Manynas

Manynas (data, Norp = 0.2)
Arguments

Data
A data frame with the data set.

Norp
A number controlling when a row was considered to has too many NA values (defaults to 0.2, i.e. 20% of the columns). If no rows satisfy the constraint indicated by the user, a warning is generated.
Judgment is missing by proportion. 3. knnimputation k Nearest neighbor fills

Library (DMWR)
knnimputation (data, k = ten, scale = T, meth = "Weighavg", Distdata = NULL)

Arguments

Arguments
data	A Data frame with the data set
k	The number of nearest neighbours to use (defaults to)
scale	Boolean Setting If the data should is scale before finding the nearest neighbours (defaults to T)
meth	String indicating the method used to calculate the value into fill in each NA. Available values is ' median ' or ' weighavg ' (the default).
distdata	Optionally you could sepecify here a data frame containing the data set that Sho Uld is used to find the neighbours. This was usefull when filling in NA values on a test set, where you should use only information from the training set. This defaults to NULL, which means that the neighbours would be is searched in Data

Details
This function uses the K-nearest neighbours to fill in the Unknown (NA) of values in a data set. For each case with any NA value it'll search for its K most similar cases and use the values of these cases to fill in t He unknowns.

If meth= ' median ' the function would use either the median (in case of numeric variables) or the most frequent value (in CAs E of factors), of the neighbours to fill in the NAs. If meth= ' Weighavg ' the function would use a weighted average of the values of the neighbours. The weights was given by exp (-dist (k,x) where Dist (k,x) was the Euclidean distance between the case with NAs (x) and the NE Ighbour K

Example:

#首先读入程序包并对数据进行清理 
Library (DMWR) 
data (algae) 
algae <-algae[-manynas (algae),]

> Head (clean.algae)
  season  size speed  mxph mnO2     Cl    NO3     NH4    oPO4   PO4 CHLA A1
1 Winter Small medium 8.00  9.8 60.800  6.238 578.000 105.000 170.000 50.0  0.0
2 Spring Small Mediu M 8.35  8.0 57.750  1.288 370.000 428.750 558.750  1.3  1.4
3 Autumn Small medium 8.10 11.4 40.020
  5.330 346.667 125.667 187.057 15.6  3.3
4 Spring Small medium 8.07  4.8 77.364  2.302  98.182< c25/>61.182 138.700  1.4  3.1
5 Autumn Small medium 8.06  9.0 55.350 10.416 233.700  58.222  97.580 10.5  9.2
6 Winter small high 8.25 13.1 65.750 9.248 430.000 18.250  56.667 28.4 15.1

4. Centralimputation () center interpolation

Interpolation of missing data with the median of non-missing samples (median)

Data (algae)
cleanalgae <-centralimputation (algae)
Summary (cleanalgae)

5. Complete.cases () Looking for the full data set

X <-airquality[,-1] # x is a regression design matrix
y <-airquality[,  1] # y is the corresponding respon Se
#验证是否complete. Cases results As with is.na
Stopifnot (complete.cases (y)! = is.na (y))
#x, y common non-missing row bool result
OK <-complete.cases (x, y)
#共有几个缺失样本
sum (!ok) # How many is not "OK"?
#得到非缺失样本
x <-x[ok,]
y <-Y[ok]

6. Na.fail () whether there are missing values

DF <-data.frame (x = C (1, 2, 3), y = C (0, ten, NA))
na.fail (DF)

Error in Na.fail.default (DF): There is a missing value in the object

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

R language Missing value processing

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

R language Missing value processing

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support