Decision Tree Algorithm (a)--some important mathematical concepts

Source: Internet
Author: User

write it in front.      

While now I am still high school physics teacher, occasionally part-time English teacher hurriedly seize time to write something about computer science and technology. One is to express my love for computers, and secondly, of course, the most important, to meet my strong vanity. haha haha! Think of high school math physics chemistry teacher in Toss computer, is not a kind of instant hanging fried days feeling.  

I have been writing this series for one months, and then I will put it out on a rolling succession. Hope to have a little help to everyone. If you do not understand what I am writing, it must be that I was wrong, not clear enough to speak. There is no knowledge in the world is difficult, just see your knowledge reserves and understanding reached the corresponding level No. At least that's what I think.

Of course, if you think I write something bad, hard spray, casually spray, with angry mood mad spray all no problem! Tell me the question, let me make some progress, let everybody progress together, that really is very good!





1. Background Knowledge

Let's play a game before we talk about the decision tree.
2016 is the Olympic year, my favorite two athletes, (inner play: Of course, female.) Because I am also sister, hahaha. One of course is Queen Londarosi, and one is Isinbayeva.

OK, now we're going to play a game of guessing athletes.

I think of an athlete's name in my heart, for example, Isinbayeva. Then you have 20 chance to ask questions, but I can only answer whether you are or not of these two possibilities.

We can talk like this:
You: a man?
Me: No
You: To participate in the past Olympic Games?
Me: Yes
You: Have you participated twice?
Me: No
You: participated in three times?
Me: Yes
You: In the field of events
Me: Yes
You: Jesimbaiva
Me: Congratulations, correct!

The process we play over there is a little bit like a decision tree algorithm.

We often use decision trees to deal with classification problems, and in recent years, decision trees are often used in data mining algorithms.

The concept of a decision tree is very simple, and we can quickly understand the number of decisions by a graph. I used the "machine learning and combat" the content of this book to explain. 1, the following flowchart is a decision tree, the square represents the Judgment module (decision Block), the ellipse represents the terminating module (terminating block), indicating that it has been concluded that the operation can be terminated , the left and right arrows from the judging module are called branches (branch)



This is an imaginary message classification system. First, the system will detect the domain name of the sent mail address, if the address is myemployer.com the message to the "boring need to read the message" If there is no such domain name we will check the contents of the message is not included in the "Hockey" mail. If included, place these messages in "friend Mail that needs to be processed in a timely manner," or categorize the messages as "spam without reading"
2. Constructing Decision Trees

Based on the above description, we have found that the primary purpose of constructing decision trees is to find the features that are most easily distinguishable from one set and another at each classification. In the above example, we are the first to find the domain name of the mail, in the first classification, the mail domain name is our most important classification features.

In order to find the defining characteristics and to divide the best results, we have to evaluate each feature, and after the test is complete, the original data set is divided into several subsets of data. Based on the best features we have chosen, these data are divided into two categories. We detect these two categories separately, if the categories are the same, we do not need to partition again, if the categories are different, we would like to repeat the above steps. is to select other important features in the subset to divide the data into other categories.

According to this description, we can easily find that this process is a recursive process, how to find these best features, we need to do something to understand some mathematical concepts. We need to use the knowledge of information theory to divide the data set.



3. Some mathematical concepts that need to be understoodthe principle of partitioning a dataset is to make the unordered data more orderly. There are several ways to divide the data, and here we build the decision tree algorithm using information theory to partition the data set, then write the code to apply the theory to the specific data set, and then write the code to build the decision tree. We use information theory metrics to organize disorganized data.

For example I give a message:
I love you 1314.
This is a simple message, and this time we can classify this information, for example, to find out the verbs, numbers, pronouns, nouns, etc. in this sentence. What we need to know is that information processing is the use of mathematical statistics to express the clutter.
Here we can divide this information into three categories: pronouns (nouns), verbs, and numbers, we use X1 to denote pronouns, X2 to denote verbs, X3 to denote numbers, and P (xi) to indicate the probability that this classification will appear in this message. Then we can define the information as:

High school graduates know that the probability P (Xi) is a fraction, and then the logarithm function with 2 as the base, it is greater than 1, if the power is a fraction, the base is greater than 1 of that value is a negative number. For ease of handling, add the minus sign in front.
information entropy, referred to as entropy, is used to denote the expectation of information.

3.1 Information Entropy
based on the definition of encyclopedia entry, let's take a look at the basic concepts in information theory.
Information Theory:information theory is the application of the methods of probability theory and mathematical statistics, such as the research of data, information entropy, communication system, transmission, cryptography, compression and other problems. Subject. Information System is a generalized communication system, which refers to the system of a kind of information which is transmitted from one place to all the equipment needed in another.

In the 1948, Shannon put forward the concept of "information entropy" , which solved the problem of quantitative measurement of information. Information entropy the word is C. E. Shannon borrowed it from thermodynamics. The thermal entropy in thermodynamics is the physical quantity that indicates the degree of molecular State confusion. The concept of Shannon information entropy is described to describe the uncertainty of the source.

we can use entropy to measure how much information we have. in giving the formula for the entropy of information, I would like to say a few basic concepts so that you can understand the calculation formula of information entropy.


3.2 random variables

We can look at some of the following questions first.
Someone shot once, may appear hit 0 ring, hit 1 ring ..., hit 10 ring and other results. The possible results can be represented by the 11 number of 0,1,2,3,4,5,6,7,8,9,10.

In a certain product inspection, in May contain the defective 100 pieces of the product to be taken arbitrarily 4 to examine, then contains the defective goods may be 0 pieces, 1 pieces, 2 pieces, 3 pieces, 4 pieces, namely may appear the result can be represented by the 0,1,2,3,4 this 5 number.

We refer to these events as random experiments , the results of a random experiment (such as the number of rings shot at a hit) can be represented by a variable, then the variable is called a random variable (variable) .

Some characteristics of random variables :
1. Available number representation
2. All possible values can be determined before the experiment
3. Can not determine the specific value before the experiment
4. All possible values are listed in some order

Discrete random variables
The values of random variables can be listed as above-mentioned shooting events

continuous type random variable
That is, the value cannot be listed one by one, for example, the amount of temperature change in a day

Promotion :
In general, if X is a random variable, and if there is a y=f (x), then y is also a random variable


3.3 Mathematical Expectations

In probability theory, the expectation of the mathematical expectation is the average value, and it represents the mean level of the random variable.

Formula for calculation
X1,x2,x3,......,xn for this discrete random variable, p (X1), P (X2), P (X3), ... p (Xn) are the probability functions for these data. In several randomly occurring data, p (X1), P (X2), P (X3), ... the P (Xn) probability function is understood to be the frequency f (Xi) in which data x1,x2,x3,......,xn appears. Then:
E (X) = X1*p (X1) + x2*p (X2) + ... + xn*p (Xn) = X1*f1 (X1) + X2*F2 (X2) + ... + xn*fn (Xn)

this looks a bit disgusting above, we come to the high school math textbooks in the east, minutes exposed to the age of mathematics textbooks Ah, but still like(Here we mainly consider discrete random variables)
in a word, mathematical expectation is the value of a random variable multiplied by the probability of the random variable being taken in a random experiment.

Promote



Let's give an example.




If you have understood the mathematical expectations, the random variables of these concepts then we will say the calculation of information entropy.
It's all said. Entropy is the expectation of information, the expectation of information, the expectation of information, and if you have read the mathematical expectations, then you should be able to understand how information entropy will be calculated.
or just that example, we gave a message: I love you 1314, then divide this information into three categories, and then we want to calculate the entropy of this information.
It is not necessary to calculate this information in all categories that may be worth the mathematical expectation. so the formula for entropy is the following:


where n indicates that this information is classified as N class.




3.4 Information gain (information gain)The change in information after the data set is called information gain, knowing how to calculate the information gain, we can calculate the information gain of each eigenvalue partition data set, obtain the most information gain characteristic is the best choice. we should learn how to calculate the information gain now, let's talk about it in the next chapter. Really can not hit so many words, the next chapter we began to happily write code ~










write it in the back.
you have to work very hard to make it look effortless.






















Decision Tree Algorithm (a)--some important mathematical concepts

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.