Data Mining Classification Technology

Last Update:2018-12-05 Source: Internet

Author: User

Tags svm neural net

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Data Mining Classification Technology

Many specific classification technologies have been developed since the classification problem was raised. The following describes the four most common classification technologies.AlgorithmImplementation and optimization are not the focus of this book, so we try to express these technologies in languages that can be understood by application personnel. And we will4Chapter once again tells readers about classification algorithms and related principles.

Before learning these algorithms, we must be clear that classification algorithms are not accurate. Each algorithm runs on a test set with an accuracy indicator. Classifier made of different algorithms (Classifier) Different datasets also have different performance.

KNN,KNearest Neighbor Algorithm

K nearest neighbor ( K-Nearest Neighbor , KNN ) classification algorithms can be said to be the simplest method in Data Mining classification technology. The so-called K nearest neighbor is K is the nearest neighbor, which indicates that each sample can use K is represented by neighbors.

we use a simple example to describe KNN algorithm concept. If you live in a home in the city center, the prices of houses of the same type in several surrounding communities are 280 to 300 , so we can classify your house and its neighbors, and we can also sell 280 to 300 . Similarly, if your friend lives in a suburb, the prices of similar houses around him are 110 to 120 , the price of his house is 110 to 120 .

KNNThe core idea of an algorithm is that if a sample is in the feature spaceKMost of the most similar samples belong to a certain category, the sample also belongs to this category, and has the characteristics of the samples in this category. In determining the classification decision, this method only determines the category of the samples to be classified based on the class of the nearest or several samples.KNNThis method is only related to a very small number of adjacent samples in class decision making. BecauseKNNThe method mainly depends on a limited number of neighboring samples, rather than the method for determining the category of the class domain. Therefore, for a set with many cross or overlapping classes,KNNThis method is more suitable than other methods.

Decision tree (Demo-tree)

IfKNNIs the simplest method, so the decision tree should be the most intuitive and easy to understand classification algorithm. The simplest form of decision tree isIf-then(If-Tree-based decision-making.

For example, the following decision tree divides all samples into four categories: "rich handsome", "handsome", "rich", and "diaosi" based on the sample appearance and wealth attributes.

If (obj. Appearance = "handsome") then

{

If (obj. Fortune & gt; = 1000000000) then

{

Print (obj. Name + "Gao fushuai ");

}

Else

{

Print (obj. Name + "handsome guy ");

}

Else

{

If (obj. Fortune & gt; = 1000000000) then

{

Print (obj. Name + "is rich ");

}

Else

{

Print (obj. Name + "");

}

Each node on a decision tree is either a new decision node or a leaf that represents classification, and each branch represents a test output. Decision nodes determine attributes, and all leaf nodes are a category. The problem that decision trees need to solve is the attributes used to act as the nodes of the tree. The most important of them is the root node (Root Node), There are no other nodes on it, and all other attributes are its subsequent nodes. In the preceding example ,(OBJ.Appearance="Handsome") is the root node, two (OBJ.Fortune& Gt; = 1000000000Is the two decision nodes at the next layer of the root node, fourPrintEach of the four leaf nodes corresponds to a category.

After entering the decision tree, all objects are classified into one of the four categories based on their respective "appearance" and "wealth" attributes.

Most classification algorithms (such as the neural networks and SVM mentioned below) are similar to the output results in the black box format. You cannot figure out the specific classification method, the decision tree is easy to understand. Decision trees can be divided into information-based methods and minimum values based on different split criteria.GiniMetric (Gini Index) Method.

Neural Network (Neural Net)

InKNNAfter the algorithm and decision tree algorithm, let's look at the neural network.

A neural network is like a child who loves to learn. The knowledge you teach him will not be forgotten, and he will learn what he has learned. We set (Learning set. After all the learning sets are completed, the neural network sums up his own ideas based on these examples. In the end, how does it generalize is a black box. Then we can set the test set (Testing set).80%Or90%), Then the neural network is successfully built. Then we can use this neural network to determine the transaction classification.

Neural Networks Model and connect neurons, the basic unit of the human brain, to explore models that simulate the functions of the human brain system, an artificial system with intelligent information processing functions such as learning, association, memory and pattern recognition is developed. An important feature of a neural network is that it can learn from the environment and store the learning results in the network's SYN connections. Neural network learning is a process in which, inspired by the environment, some sample models are successively input to the network and follow certain rules (learning algorithms) adjust the weight matrix of each layer of the network. When the weights of all layers of the network converge to a certain value, the learning process ends. Then we can use the generated neural network to classify real data.

SVMSVM(Support Vector Machine)

The Support Vector Machine statement may be abstract compared with the preceding three algorithms. We can understand this as much as possible, and try to combine the samples from a higher dimension, such as in a one-dimensional (straight line) samples in a space can be divided into different types on a two-dimensional plane. Samples scattered on a two-dimensional plane can be classified if we look at the three-dimensional space.

The purpose of the SVM algorithm is to find an optimal hyperplane to maximize the classification interval. The optimal hyperplane requires that the classification plane not only correctly separates the two classes, but also maximizes the classification interval. In the two types of samples, the point closest to the classification surface and located on the superplane parallel to the optimal superplane isSupport VectorTo find the optimal hyperplane, you only need to find all the support vectors. For non-linear SVM, it is usually used to convert the linear feature that cannot be divided into linear deletable, and map the data features in the low-dimensional input space to the high-dimensional linear feature space through a nonlinear ing, find the linear optimal classification hyperplane in a high-dimensional space.

Support Vector Machine (SVM) is an algorithm that is very important for data mining applications because it has been regarded as one of the best classification algorithms since its publication.

This article is excerpted from new Internet: Big Data Mining

Tan lei

Published by Electronic Industry Publishing House

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More