Non-parameter estimation (continuous ...... Careful !)

Source: Internet
Author: User

We often encounter non-parameter estimation problems-K nearest neighbor, meanshift, and Kernel Density Estimation. Therefore, we plan to learn this part of the theoretical knowledge system over the past two days, feel it here.

1. Introduction: questions about height differences between men and women.

This is a question that the interviewer asked me when I interviewed machine learning jobs in an online company: how to measure the differences in height distribution between men and women?

My first response was mean and variance.

However, the mean and variance are statistics when the default data is normally distributed. Do men and women have normal height distributions? Otherwise!

Next, I thought of more detailed histogram estimation, and then I could make similarity measurements on the histogram of l1 norm normalization. There are various measurement methods, common include: Correlation Degree (Euclidean distance, fish line distance), Chi-square coefficient, intersection coefficient, distance, etc. For details, refer to: http://blog.csdn.net/cxf7394373/article/details/6955530

Histogram estimation, as a fast, simple, and effective non-parameter estimation method, can reflect the data distribution with a certain degree of accuracy. However, the accuracy of histogram estimation depends on the bin width, when the bin is wider, the histogram estimation error is large.

Later, I also thought about the difference between the height of men and women in Gaussian mixture modeling model. This is an extension of the Gaussian model assumption in a single-peak model, as an extension of the probability density estimation with parameters, you can also use other models.

A large amount of data exists in our lives, which is an unpredictable distribution. We cannot use a known model to fit its distribution. At this time, we need to use powerful non-parameter estimation methods. The histogram mentioned above is one of them.

The following uses Duda's pattern classification as a reference to introduce the application of non-Parameter Estimation Technology in probability density estimation.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.