"Machine Learning Basics" noise and error

Source: Internet
Author: User

Destination distribution (target distribution)

In the actual situation, the mis-labeling of the training data, the input data of a certain dimension is inaccurate, may lead to inaccurate data information, produce noise data.
Due to the effect of noise, we can now think of y as a probability distribution, and Y is also sampled from some distribution, namely Y~p (Y|X).


Here P (y|x) is called the target distribution.

Looking back, we can summarize the goal of learning to predict the ideal target (P (y|x)) in a common input (in accordance with P (x)).

Measurement of error

Before we used the real error eout (g) to measure the error, here we consider the sample data of unknown x, for individual measurement and classification of each x (0/1 question) problem.

Measure the error of each point (pointwise error Measure)

We use the error measure of each point to measure the overall error, denoted by err.


We use the 0/1 error to measure the classification problem, and we use the square error to measure the regression problem.

A new learning process
Type of error

Different types of errors can lead to different penalty strategies.


These two errors are wrongly accepted (false accept, which is actually negative, but a positive case) and false rejection (false reject, which is actually a positive case, but a negative case).
In some places these two errors are also called false positive (false positive, false positives, legal judgments are illegal) and false negative (false negatives, negative negatives, illegal judgments to legal). Here sometimes confused, remember in the medical, positive representative of the disease or virus, negative representative of normal, then false positive is the normal diagnosis of the virus, and false negative is the virus diagnosed as normal.

Example of a supermarket fingerprint identification

If in the supermarket through fingerprint identification to carry out discount activities, if the VIP users, before the fingerprint input, there should be preferential activities, otherwise there is no.
If False reject occurs, then the customer may be unhappy, thus losing a part of the future business, and if False accept, the supermarket just lost a little money.
Therefore, for the supermarket cost table, False reject will sacrifice the cost is relatively large, and false accept sacrifice cost will be small. Therefore, we should try to avoid false reject situation.


Examples of CIA fingerprint recognition

If the CIA, use fingerprint identification to determine if the person has access to the system to view important information.
Then, the occurrence of false accept will lead to very serious consequences, and false reject words, there will not be too much impact.
Therefore, for the CIA's cost table, should try to avoid false accept the situation.


Summary

We should consider different algorithm design strategies according to different error costs. Specific questions, we will explain in the following specific algorithm introduction. There is only one concept to be used here.
We are summing up our machine learning process. Prior to that, we knew to use err to measure errors, whereas in practical applications, different design strategies should be adopted depending on the type of error, so err hat was used to evaluate errors in real-world situations.


Reprint please indicate the author Jason Ding and its provenance
GitHub home page (http://jasonding1354.github.io/)
CSDN Blog (http://blog.csdn.net/jasonding1354)
Jane Book homepage (http://www.jianshu.com/users/2bd9b48f6ea8/latest_articles)

"Machine Learning Basics" noise and error

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.