R language and data analysis three: Classification algorithm 2

Last Update:2014-12-09 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The traditional classification algorithm that we share with you in the last period is based on the discriminant function, and the classification of the target sample is determined by the value of the discriminant, which has a basic hypothesis: linear hypothesis. Today we continue to share with you the more modern classification algorithms: Decision trees and neural networks. Both algorithms are derived from the disciplines of artificial intelligence and machine learning.

First, and the small partners to introduce the data mining domain compared to the classic KNN (nearest neighbor) algorithm (nearest neighbor algorithm)

Basic idea of the algorithm:

STEP1: Calculates the distance between the sample to be measured and all the points in the learning set (Euclidean distance or Markov distance), sorts by distance, selects the nearest K learning point;

STEP2: Statistical screening of K-learning points, see their distribution in the classification, the maximum frequency of the classification and for the point to be measured;

Decision Trees (decision tree)

The algorithm mainly comes from artificial intelligence, the common language game theory, the basic logic such as (explanation female netizen sees the male netizen's decision-making process). The attributes of a decision-making learning set can be non-sequential, can be a factor, or can be logically non-equal. In the decision-making process, we need to find the most information gain attribute as the root node, and then step through the information gain sub-small properties, as the next layer of decision-making point, the hierarchical information of all the attributes, can be made decision tree. The most currently used ID3 and its subsequent upgrade version.

Now let's see how R helps us with decision tree analysis, we do it with IRIS data sets, and we need to import the Rpart package for decision Tree Analysis:

Install.packages ("Rpart") library (Rpart) Iris.rp=rpart (species~.,data=iris,method= "class") plot (iris.rp,uniform=t , branch=0,margin=0.01,main= "DecisionTree") text (iris.rp,use.n=t,fancy=t,col= "Blue")

Results such as:

Artificial neural network

ANN (Artificial neuralnetworks)

The learning set constructs a model (Perceptron: for example), the figure 0.3 is the weight of the branch, 0.4 is the bias factor (t), sum sum is the activation function of this example (but also other functions: triangle, exponent, etc.), the artificial neural network is through the learning set to correct weights, through the negative feedback process, The specific algorithm is as follows:

STEP1: Another d={(Xi,yi) |I=1,2...N} as a training set; STEP2: Randomly generates the initial weight vector w;step3:for Each training set calculates the output forecast yyifor each weight WJ update weight WJ (k+1) =wj (k) +a (Yi-yyi ( k) *xijendforendforuntil meet termination conditions ps:a for learning efficiency, usually a smaller number

The problems displayed are often complex and require the construction of multilayer neural networks such as:

Next, we will share with the small partners how the R language implements the artificial neural network analysis, we need to install the Amore package, we solve the above mentioned 3 variables category y case:

Library (amore) x1=c (1,1,1,1,0,0,0,0) x2=c (0,0,1,1,0,1,1,0) x3=c (0,1,0,1,1,0,1,0) y=c ( -1,1,1,1,-1,-1,1,-1) p<- Cbind (x1,x2,x3) target=ynet <-newff (N.neurons=c (3,1,1), learning.rate.global=1e-2,momentum.global=0.4, Error.criterium= "LMS", stao=na,hidden.layer= "Tansig", output.layer= "Purelin", method= "ADAPTGDWM") # N.neurons=c ( Number of input nodes,...... Intermediate node, output node number), error.criterium= "LMS" to determine the basis of convergence, the minimum average flat method, hidden.layer= "Tansig" hidden layer activation function, output.layer= "Purelin" The output layer of the OH activation function result <-train (net,p,target,error.criterium= "LMS", report=true,show.step=100,n.shows=5) Z<-sim ( RESULT$NET,P)

The output is shown in:

Where the z-view symbol becomes distinguishable, comparing Z and y, it is found that the result of the neural network coincides with the target value of 100%.

Thus, we can see the powerful charm of artificial neural network, we can not understand the internal specific algorithm principle, we only need to determine the input and output and set the corresponding node can be easily completed classification. For the number of hidden layers we need to do some analysis, not the more hidden layers, the more accurate the model, there are two reasons:

1, for the size of the problem is not so complex, more hidden layer will waste our excessive unnecessary time;

2, the more hidden layer can really bring us better fitting effect, but it should be noted that the over-fitting of the learning set will result in large errors in the prediction.

The Black-box of the neural network is a double-edged sword, on the one hand, the black box brings great convenience to us; but on the other hand, the hidden nature of the black box makes it impossible for us to interpret the model and the business combination, so the neural network needs new ideas to reconstruct the algorithm. The appearance of Hopfield neural network solves the shortcomings of early neural networks such as black-box and overfitting.

About Hopfield everyone on their own Baidu try it, live everyone good luck.

R language and data analysis three: Classification algorithm 2

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

R language and data analysis three: Classification algorithm 2

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

R language and data analysis three: Classification algorithm 2

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support