Sub-box problem __ Machine learning

Source: Internet
Author: User
problem

Suppose that the 12 sales price group has been sorted as follows: 5, 10, 11, 13, 15, 35, 50, 55, 72, 92, 204, 215 use each of the following methods to divide them into four boxes. When equal frequency (equal depth) is divided, 15 is in the first few boxes. Equal width is divided in the first few boxes.

The problem of the box is divided into supervised box and unsupervised compartment. non-supervised sub-box and other wide compartment
The value range of the variable is divided into K-width intervals, each of which is treated as a compartment.
In this problem, the range of variables is 5–215,k 4. (215-5) The/4=52.5 dividing point is the data of 57.5,110,162.5,4 box
Box A: 5, 35, 50, 55
B Box: 72, 92
C Box: Empty
D box: 204, 215 equal frequency (equal depth) sub-box
The observed values are arranged in order from small to large, according to the number of observations divided into K, each part as a compartment, for example, the smallest number of 1/k proportional to the observation of the formation of the first compartment, and so on.
The number of observations in this question is 12.k=4. There are 3 data in each case.
Box A: 5, 10, 11,
B Box: 35
C-Box: 50, 55,72
D Box: 92
, 204, 215 K-Cluster box
K-Means clustering method is used to gather the observed values into K class, but in the process of clustering, it is necessary to ensure the order of the box: all the observations in the first compartment are less than the observations in the second, and all the observations in the second compartment are smaller than the observed values in the third compartment. Hand is too time-consuming, should not appear in the written examination. supervised sub-box

Considering the value of the dependent variable in the compartment, the minimum (minimumentropy) or minimum description length (minimumdescriptionlength) of the scoring box is achieved.

(1) Assuming that the variable is a classified variable, the desirable value is 1,...,j. Order PL (j) indicates the proportion of observations with a value of J in the L=1,...,k,j=1,...,j in the L-box, then the entropy of the L-box is JJ=1[-PL (j) Xlog (PL (j))]. If the proportions of the variables in the L-box are equal to each other, that is, pl (1) =...=pl (j) =1/j, then the entropy value of the L-box is maximum; If the dependent variable in the L-box is only one value, that is, a pl (j) equals 1 and the other class is equal to 0, Then the entropy of the L-box reaches the minimum value.

(2) The number of observations in the L-box is represented by RL as the proportion of all observations; then the total entropy value is KL=1RLXJJ=1[-PL (j) Xlog (PL (J))]. The total entropy value needs to be minimized, which means that the compartment is able to differentiate the various categories of dependent variables to the fullest extent.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.