Cross-sectional data classification--based on R

Last Update:2016-04-23 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Resources:

Complex data statistics methods & Network & Help Files

Application: When the dependent variable is a categorical variable and the argument contains more than one categorical variable or a higher level of categorical variable.

One.

(i) Introduction and examples

Data Source: http://archive.ics.uci.edu/ml/datasets/Cardiotocography

Independent variable: LB-FHR baseline (beats per minute)

AC-# of accelerations per second
FM-# of fetal movements per second
UC-# of uterine contractions per second
DL-# of light decelerations per second
DS-# of severe decelerations per second
DP-# of prolongued decelerations per second
Astv-percentage of time with abnormal short term variability
Mstv-mean value of short term variability
Altv-percentage of time with abnormal long term variability
Mltv-mean value of long term variability
Width-width of FHR Histogram
Min-minimum of FHR Histogram
Max-maximum of FHR Histogram
Nmax-# of Histogram peaks
Nzeros-# of histogram zeros
Mode-histogram mode
Mean-histogram Mean
Median-histogram Median
Variance-histogram Variance
Tendency-histogram Tendency
CLASS-FHR Pattern Class Code (1 to 10)

Dependent variable:

Nsp-fetal State Class Code (N=NORMAL; S=suspect; P=pathologic)

(ii) generation of cross-validation datasets

1.10 Cross-validation concept (Baidu Encyclopedia)

The English name is called 10-fold cross-validation, which is used to test algorithm accuracy. is a common test method. The data set is divided into very, in turn, 9 of them as training data, 1 as test data, for testing. Each test will be given a corresponding accuracy (or error rate). The average of the accuracy of the 10 times (or error rate) as an estimate of the precision of the algorithm, it is generally necessary to perform multiple 10 percent cross-validation (such as 10 10 percent cross-validation), and then the mean value, as an estimate of the accuracy of the algorithm.
The choice to divide the dataset into 10 is due to a large number of experiments using a large number of datasets and different learning techniques, suggesting that 10 percent is the right choice for obtaining the best possible error estimates, and there are some theoretical evidence to prove this. But this is not the final diagnosis and the controversy persists. And it seems that the results of 50 percent or 20 percent and 10 percent are comparable.

Fold=function (z=Ten, w,d,seed=7777) {n=Nrow (w) d=1: Ndd=list () e=levels (W[,d]) T=Length (e)Set. Seed (Seed) for(Iinch 1: T) {D0=d[w[,d]==e[i]]j=Length (d0) ZT=rep (1: Z,ceiling (j/z)) [1: J]id=cbind (Sample (Zt,length (ZT)), D0) Dd[[i]=id}mm=list () for(Iinch 1: Z) {u=NULL; for(jinch 1: T) u=c (u,dd[[j]][dd[[j]][,1]==i,2]) Mm[[i]]=U}return(mm)}

#读入数据w=read.csv ("CTG. Naomit.csv") #因子化最后三个哑元变量F=:  #三个分类变量的列数   F) w[,i]=factor (W[,i]) D=  #因变量的位置Z=ten  #折数n =nrow (w) #行数mm=fold (z,w,d,8888)

Two. Decision tree Classification (classification tree)

Library (Rpart.plot) (a=rpart (nsp~., W)) #用决策树你和全部数据并打印输出rpart. Plot (A,type=2, extra=  4)

Rpart.plot parameter Explanation:

An Rpart object. The only required argument.

Type

Type of plot. Five Possibilities:

0 the default. Draw a split label at each split and a node label at each leaf.

1 Label all nodes, not just leaves. Similar to Text.rpart ' s all=true.

2 like 1 but draw the split labels below the node labels. Similar to the plots of the CART book.

3 Draw separate split labels for the left and right directions.

4 like 3 but the label all nodes, not just leaves. Similar to Text.rpart ' s fancy=true. See also Clip.right.labs.

Extra:

Display extra information at the nodes. Possible values:

0 No Extra Information (the default).

1 Display The number of observations that fall in the node (per class for class objects; prefixed by the number of events For Poisson and EXP models). Similar to Text.rpart ' s use.n=true.

2 Class Models:display The classification rate at the node, expressed as the number of correct classifications and the Nu Mber of observations in the node. Poisson and Exp Models:display the number of events.

3 Class models:misclassification Rate at the node, expressed as the number of incorrect classifications and the number of Observations in the node.

4 class models:probability per class of observations in the node (conditioned to the node, sum across a node is 1).

5 class Models:like 4 but does not display the fitted Class.

6 class models:the probability of the second class only. Useful for binary responses.

7 Class Models:like 6 but does not display the fitted Class.

8 class models:the probability of the fitted class.

9 Class models:the probabilities times the fraction of observations in the node (the probability relative to all Observat Ions, sum across all leaves is 1).

Branch

Controls The shape of the branch lines. Specify a value between 0 (V shaped branches) and 1 (square shouldered branches). Default is if (fallen.leaves) 1 else. 2.

Branch=0

Branch=1

Digits:

The number of significant digits in displayed numbers. Default 2.

Rpart.plot (a,extra=4, digits=4)

Cross-sectional data classification--based on R

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Cross-sectional data classification--based on R

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Cross-sectional data classification--based on R

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support