Data mining-how to do it well (as I expected)

Source: Internet
Author: User

Data mining-how to make it better (as I expected)
Because I didn't even make it well, I just thought about the problems I encountered and how to solve them!

Recently, this may be due to the high-dimensional reasons. Most of the theories and examples in the book are low-dimensional (less than 100). The theory is perfect, but all problems come out in practice, and there are a lot of problems ................... some of them are unimaginable in advance. Let's take the question I wrote a few days ago: "high-dimensional random numbers". What is the distribution of several even random numbers? I haven't figured it out yet:

I now think that data mining needs to be well performed:

1. Probability and Statistics:

I don't know whether data mining is from databases or statistics, but now statistics are indispensable. Whether data mining extracts rules or predicts (clustering, classification, regression, and other statistical techniques are indispensable) if the theory is not understood, but coding is performed based on the algorithms given by others ......., if something goes wrong, you will be sb. If you do not have the knowledge of statistics, you do not know where the problem is. and how to solve the problem.

2. Optimization Algorithm and numerical calculation:

The implementation of statistical algorithms requires computation, not simple computation. Generally, data mining requires a large amount of computation, because statistical computation requires a large amount of computation, you still use the same formula as low-dimensional computing, such as the n-th power, N orders, and N cycles. The final calculation amount may be unacceptable, it is inevitable to adopt approximate calculation. How to approximate, where should we approximate, and where should not be approximate, or such an approximation will cause a majority of loss to accuracy. If you do not optimize the algorithm and numerical calculation, then you will ....

3. Program Efficiency

The same algorithm can have different programs, and the efficiency of which is high is a problem. How to improve? I do not know the features of the language, what language processing cycle, and fast numerical calculation.

Others, the boss said, to understand the characteristics of the CPU, as well as the computer architecture, parallel computing wait, that now says, it seems that there is still some distance ............

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.