Comparison decision tree and Regression

Source: Internet
Author: User

Many target variables of the marketing prediction model are statuses or types, such as "buy" or "Don't buy", "Broadband" or "dial-up", and "email, phone, or network" for the marketing channel. This type of problem is collectively referred to as "classification ". Decision Trees and logistic regression are experts in solving the classification problem. Use differentAlgorithmAnswering the same question naturally leads to a better discussion between the two, but so far there is still no clear conclusion. This is expected because the specific performance of the two depends on the data status and the level of the mining personnel. From the perspective of the algorithm itself, decision trees and regression have their own advantages. Therefore, the best application is not to choose one of the two, but to choose one from each other. The advantages of one party can be used to make up for the shortcomings of the other.

Before further discussion, let's take a look at the main differences between logistic regression and decision trees.

some differences are superficial. For example, a decision tree can deal with missing values, while logical regression requires mining personnel to process missing data in advance. But in fact, the decision tree also needs to make some assumptions and process the missing values. For example, if a cart is missing, it is replaced and split with a secondary variable. This practice can also be done in logistic regression, but it requires separate programming. In decision trees, this step has been embedded into the software's algorithm engine.
in essence, the differences between decision trees and logical regression are: 1. Logical regression is superior to decision trees in analyzing the overall data structure, while decision trees are superior to logical regression in analyzing local structures. 2. Logistic regression is good at analyzing linear relationships, while decision trees have poor grasp of linear relationships. Although dealing with non-linear relationships is the strength of decision trees, many non-linear relationships can be used as an approximation, and the effect is good. Linear relationships have many advantages in practice: conciseness, easy to understand, and can prevent over-fitting of data to a certain extent. 3. Logistic regression is sensitive to extreme values and is easily influenced by extreme values, while decision trees are superior in this respect.
the difference between the two lies in algorithm logic. The decision tree adopts the segmentation method, so it can go deep into the data details, but at the same time it lacks the global grasp. Once a layer is formed, its relationship with other layers or nodes is cut off, and later mining can only be performed locally. At the same time, due to segmentation, the number of samples keeps shrinking, so it is impossible to support simultaneous multi-variable testing. Logistic regression always focuses on the fitting of the entire data, so we have a better grasp of the global situation. However, local data cannot be taken into account, or the internal mechanism for exploring local structures is lacking.
except for logistic regression and decision tree, there are some application differences. The decision tree results are slightly rough compared with logical regression. In principle, logistic regression can provide the probability of each observation point in the data, while the decision tree can only divide the mining object into finite probability groups. For example, if a decision tree determines 17 nodes, the total population can only have 17 probabilities, which is limited in application. In terms of operations, decision trees are relatively easy to use and require less data preprocessing, while logical regression requires certain training and skills.
the main idea of complementing or enhancing the two is to use decision trees to better grasp the local data structure and increase the effectiveness of Logistic regression. There are several specific methods. One is to find the local structure of data from decision tree analysis, which serves as the basis for building an interaction in logistic regression. Another approach is to use decision tree analysis to determine the optimal splitting point when the prediction factor needs to be discretization. Another method is to use the final result of decision tree classification as the prediction variable, and use it with other co-variables into the regression model, also known as the "wedding model ". Theoretically, the grafting model combines the advantages of decision trees and logistic regression. The final node contains an important local structure in the data, and the co-variables can retrieve the overall structure of the data missed by the decision tree.

Grafting Model is a clever design, but it is not widely recognized in practice. Since the decision tree has fitted the data to the maximum extent, there is little room for covariables. In other words, when the final node of the decision tree is used as the prediction factor, it is possible that no co-variables with independent functions can be found. Without covariables, logistic regression is actually a duplicate of decision trees. In addition, nodes are not easy to explain because they are a combination of multiple attributes. What each node represents is unclear, which limits the promotion of this method.

 

This article from http://blog.sina.com.cn/s/blog_652090850100gwxl.html

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.