It is important to narrow the scope of data mining with features.

Source: Internet
Author: User

Zheng @ playfun RT 20091124

 

When talking about social data mining, there is a little bit of insight. Please refer:

When looking for new value from Social Data Mining in mainland China, we generally consider two points:
1. Is there enough data;
2. How the data proves to be valid/valuable, or how you can clean the data.

Generally, most idea loses when it encounters the first problem.

Oneriot or its pulse rank is a bit interesting, because no matter what you search for, there is enough data in English. There are very few data, and there is no meaning for rank or sorting. So I once said that one of the characteristics of the vertical field that machine intelligence can enter is "Information Sources: rich enough network information, with many fragments and scattered". If there is little data, machine intelligence is not required at all, once you hire an editor, you can get it all done, and there is little data change. If your machine processes the data that has been produced for half a day, other websites will be able to copy/paste to you in the twinkling of an eye.

 

After the first point, but there is no feature as the entry point, the first is to directly test your machine's parallel processing and indexing capabilities. Second, you need to spend a lot of time processing junk data, this is a waste of energy, because you could have done something else. So for machine intelligence. You need to take shortcuts to narrow the computing scope from a massive collection. This is the basic solution.

That is, "in the case of massive data volumes, you must first use features and rules to filter and clean data 』.

 

Recommended reading:

1. Semantics and features

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.