What are the characteristics of machine learning?

Last Update:2016-04-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Lecture Video: What makes a good Feature? -Machine learning Recipes #3
Https://www.youtube.com/watch?v=N9fDIAflCMY

A classifier can have good performance only if you use a good feature. Providing or identifying good feature is one of the most important tasks when using machine learning.

Suppose you want to classify a dog's category to distinguish between Greyhound and Labrador.

We consider two characteristics, height (inches) and eye color.

Here we assume that the two dogs have only blue and brown two colors of their eyes.

Let's analyze the feature height first.

Usually, Greyhound is higher than Labrador, but the real world is more complicated and the height of both dogs varies in one range.

We use Python to write some code to generate random height data, of which greyhound average height of 28,labrador is 24. We draw a histogram. Red is greyhound, Blue is Labrador.

Let's analyze this histogram. First look to the left, for example, when the height of the inches, if you want to estimate the height of the dog, we should think it is Labrador, because the height of the case, 80% probability is Labrador, and only 20% of the likelihood is greyhound. Look to the right, for example, when height is inches, then 95% of the likelihood is greyhound, so we should estimate the situation of the dog as Greyhound.

However, we also note that the middle section, such as the inches, where the likelihood of the two dogs is not very different, so height for these values, it is difficult to distinguish between.

Therefore, height is a useful feature, but not perfect.

If you want to find out what kind of characteristics you should use, then you can do a simulated thinking experiment, assuming you are the classifier, you now try to distinguish between a dog is greyhound, or Labrador, you want to know something else? You may ask: how sparse is their hair? How fast are they running? How many do they weigh?

In fact, how many features should be used, more art, not a science. But in terms of experience, how many features do you need to classify, and how much the classifier might need.

Now look at another feature, the color of the eye. We assume that both dogs have only 2 colors: blue and brown, and the dog's color is irrelevant to its breed.

Its histogram statistic may look like the same. This picture doesn't tell us anything, because two kinds of dogs are almost as likely to be in two colors, so the color of the dog is also a useless feature. If you add such a useless feature when using a classifier, it will affect the classification accuracy of the classifier. Such features may seem useful, but only because of the contingency of the data itself. Especially when you have very little training data, it is more likely to make you mistakenly think such features are useful.

Moreover, we should use the characteristics of mutual independent. Because the characteristic of mutual independent can give you information without angle. For example, you already have a height of inches in your data, and it doesn't make sense to add a height of cm, because you can't provide more information. You should try to get rid of similar redundancy features, because many classifiers are sensitive and when you meet such highly correlated features, it mistakenly considers this feature more important, which is obviously not what we want.

In addition, we should use easy-to-understand features. For example, we now want to predict how many days to send a paper mail from a city to another city. Obviously, the farther away two cities, the more days you spend.

Here, miles between cities is a very good feature. There is also a poor choice to use the coordinates of two cities:

From a person's point of view, it is easy to know that miles can easily estimate the number of days, and just knowing the coordinates is not easy to estimate. And if you use the hard-to-understand features of coordinates, you'll use much more data to train the classifier than the easy-to-understand features.

To summarize, the ideal characteristic should be:

1) informative, with information;

2) Independent, independent of other characteristics;

3) Simple, easy to understand.

What are the characteristics of machine learning?

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

What are the characteristics of machine learning?

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

What are the characteristics of machine learning?

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support