Introduction of gradient propulsion algorithm based on R language

Source: Internet
Author: User
Tags new set

In general, we can improve the accuracy of a predictive model in two ways: Refine feature engineering (feature engineering) or directly use the boosting algorithm. Through the trials of a large number of data science contests, we can find that people prefer the boosting algorithm because it tends to save time when it produces similar results compared to other methods.

There are many kinds of boosting algorithms, such as gradient propulsion (Gradient boosting), Xgboost, AdaBoost, Gentle boost, and so on. Each algorithm has its own different theoretical basis, by using them, the subtle differences between the algorithms can be detected by us. If you are a novice, then great, from now on, you can use about a week to learn and learn this knowledge.

In this article, I will introduce you to the basic concept and complexity of the gradient propulsion algorithm, in addition, the article also shared a example of how to implement this algorithm in the R language.

Hurry up and answer me !

Whenever we talk about the boosting algorithm, the following two concepts occur frequently: bagging and boosting. So, what are these two concepts, and what is the difference between them? Let's take a quick and brief look at the explanation here:

Bagging: A method for randomly sampling data, building learning algorithms, and getting the final probability conclusion by simple averaging.

Boosting: Similar to bagging, but more intelligent in the selection of samples-in the process of the algorithm, the difficult to classify the observed values are given increasing weight.

We know that you may be wondering in this respect: what is the bigger? How do I know how much more weight I should give to a divided observation? Please remain calm and we will answer for you in the following chapters.

Starting with a simple example

Suppose you have an initial prediction model m that needs to be improved in accuracy, and you know that the model is currently 80% accurate (measured in any form), so what do you do next?

One way is that we can build a whole new model with a new set of input variables and then learn to integrate them. However, I would like to present a simpler proposal, as follows:

Y = M(x) + error

If we can observe that the error term is not a white noise, but has the same relevance to our model output (Y), why do we not use this error term to improve the accuracy of the model? Say:

error = G(x) + error2

Perhaps you will find that the accuracy of the model has increased to a higher number, such as 84%. So the next step is to get back to Error2.

error2 = H(x) + error3

Then we combine these equations:

Y = M(x) + G(x) + H(x) + error3

Such a result could further the accuracy of the model by more than 84%. If we could find an optimal weight allocation for three learning algorithms like this,

Y = alpha * M(x) + beta * G(x) + gamma * H(x) + error4

Well, we might have built a better model.

As described above is a basic principle of the boosting algorithm, when I first came into contact with this theory, my mind quickly popped up in these two small problems:

1. How do we determine if the error term in the regression/classification equation is white noise? If we can't judge, how can we use this algorithm?

2. If this algorithm is really so powerful, can we do nearly 100% of the model accuracy?

Next, we will answer these questions, but it should be clear that the target objects of the boosting algorithm are usually weak algorithms, which do not have the ability to retain only white noise, and secondly, boosting may lead to overfitting, So we have to stop this algorithm at the right point.

Try to imagine a classification problem

Please see:

From the left-hand side of the graph, the vertical line indicates that we use the classifier constructed by the algorithm to find that the classification of 3/10 of the observed values in this image is wrong. Next, we give the three of the values of the "+" types that are mistakenly divided into higher weights, making them very important in building the classifier. As a result, the vertical line moves directly to the right edge of the graph. After repeating such a process, we are merging all the models with the right combination of weights.

The theoretical basis of the algorithm

How do we distribute the weights of the observed values?

In general, we start with a homogeneous distribution hypothesis, which we call D1, where n observations are assigned the weights of 1/n respectively.

Step 1: Assume an α (t);

Step 2: Get the weak classifier H (t);

Step 3: Update the overall distribution,

which

Step 4: Re-use the new overall distribution to get the next classifier;

Think the math in step 3 is scary? Let's break this fear together. First, let's take a quick look at the parameters in the index, α denotes a learning rate, Y is the actual response value (+1 or-1), and H (x) is the category predicted by the classifier. Simply put, if the classifier predicts wrong, the power of this exponent becomes 1 α, and vice versa is -1α. That is, if an observation is predicted incorrectly in the previous prediction, its corresponding weight may increase. So what's the next thing to do?

Step 5: Repeat step 1-step 4 until you are unable to find any improvements;

Step 6: Weighted average for all classifiers or learning algorithms that have occurred in the above steps, the weights are as follows:

Case Practice

I recently participated in an online hackathon event organized by Analytics Vidhya. To make variable transformations easier, in complete_data we combine all the data in the test set with the training set. We import the data and sample and classify it.

library(caret)rm(list=ls())setwd("C:\\Users\\ts93856\\Desktop\\AV")library(Metrics)complete <- read.csv("complete_data.csv", stringsAsFactors = TRUE)train <- complete[complete$Train == 1,]score <- complete[complete$Train != 1,]set.seed(999)ind <- sample(2, nrow(train), replace=T, prob=c(0.60,0.40))trainData<-train[ind==1,]testData <- train[ind==2,]set.seed(999)ind1 <- sample(2, nrow(testData), replace=T, prob=c(0.50,0.50))trainData_ens1<-testData[ind1==1,]testData_ens1 <- testData[ind1==2,]table(testData_ens1$Disbursed)[2]/nrow(testData_ens1)#Response Rate of 9.052%

The next step is to build a gradient propulsion model (Gradient boosting models) to do:

fitControl <- trainControl(method = "repeatedcv", number = 4, repeats = 4)trainData$outcome1 <- ifelse(trainData$Disbursed == 1, "Yes","No")set.seed(33)gbmFit1 <- train(as.factor(outcome1) ~ ., data = trainData[,-26], method = "gbm", trControl = fitControl,verbose = FALSE)gbm_dev <- predict(gbmFit1, trainData,type= "prob")[,2] gbm_ITV1 <- predict(gbmFit1, trainData_ens1,type= "prob")[,2] gbm_ITV2 <- predict(gbmFit1, testData_ens1,type= "prob")[,2]auc(trainData$Disbursed,gbm_dev)auc(trainData_ens1$Disbursed,gbm_ITV1)auc(testData_ens1$Disbursed,gbm_ITV2)

In the above case, all AUC values seen after running the code will be very close to 0.84. We are always ready to welcome you to further refine this piece of code. In this field, the Gradient propulsion model (GBM) is the most widely used method, and in future articles we may introduce some of the more efficient boosting algorithms such as gxboost.

Concluding remarks

I have seen more than once that the boosting algorithm is fast and efficient, in the Kaggle or other platform of the competition, its scoring ability has never been disappointing, of course, it may depend on you can make the feature project (feature engineering) do much better.

Original Tavish Srivastava

Translation: sdcry!!!

Original link: http://www.analyticsvidhya.com/blog/2015/09/complete-guide-boosting-methods/

Introduction of gradient propulsion algorithm based on R language

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.