Over the years, the "pit" that was trampled on in the Data mining project

Source: Internet
Author: User

A data mining project is a project that involves a lot of links and is highly dependent on data. So it's just too normal to encounter a pit in one of these processes.
Unclear demand is the first big hole. Unclear demand will take the back of the analysis of the direction of the ditch ditch inside, it is easy to be forced to accept some of the impossible to complete the excavation goals and business objectives. But this generally only happens when a company starts to have this post, as the project increases, the following people generally know the limits of data mining, but also know the risk of data mining tasks, it will not be impossible to achieve the goal. Say a demand that was once taken in a ditch. Once received the leadership said to do a demand, to find the key factors that affect user loyalty, and then I Chi Chi extract number requirements, such as data, write analysis report, confirmed a few key factors, and then to see the customer only to find that people are mentioning the impact of high-end user stickiness factor. The range is not right, big hit. Here to avoid the pit way, you can contact the first-line customers, do not flinch, must understand their true ideas, not to be the word of mouth after the crooked, and then the vain.
The quality of the data itself is the second biggest pit. Most of the data collected by the production system is not specifically for the purpose of digging, basically is for the direct profit exists, so it is only directly affect the marketing of indicators the most reliable. Other auxiliary indicators, can only say that the quality is generally. We put forward hundreds of indicators, really can use 20 or so good. In addition to the errors that may occur when the data is recorded, it may be the data accuracy/bias and accuracy rate, inconsistent data, data omission, data outliers, data duplication and other issues. No other way to avoid the pit, can only be as much as possible to understand the basic data of the system, gather information, in the idea of improving the quality of data based on divergent thinking to generate more analytical dimensions, and then do the people know the destiny!
The data problem that occurs during the fetching process is the third largest pit. In particular, I remember the first year of graduation, still a small transparent time. One time to do a mining project, because the next day to deliver (take a few cycles long delayed the duration), a bunch of people with a mess of data analysis to 3 o'clock in the morning, the results found a key ID is wrong, resulting in all the data can not be used. That kind of want to die mood, that wants to kill the mood. Also give us a lesson in blood, do not trust someone or project time tight to give up careful examination of data. Data problems can only be called back to restructure, it is time tight also no way. But then again, originally from the various data warehouse to take the number is a miscellaneous and tired of hard work, a good point to take a few people just made a few silly mistakes. Some logical considerations are not entirely normal. And what they mean for the data, not as sensitive as we are, is to do it in a short period of time. So here's the way to avoid the pit is actually very simple, that is to check the data! Check the data! Check the data!!!
If the front three pits can be filled, then the next big hole to be said by our ordinary junior basic fill discontent. That is the ability to get the support of powerful people. At the end of the days, data mining is just a icing on the cake. For the vast number of production problems can not extricate themselves from the first-line personnel and power leadership, there will be no time and energy to do these icing on the cake. So projects often postpone or refuse to go online because of such or such problems. In general, the theory and technology of data mining have grown more mature. However, by the current collection of data and the impact of system construction, to really achieve high-level application phase there is still some distance. Now more is to stay in the data analysis and data visualization phase.
Doing the project is like this, the difficulty of accomplishing the task is the focus. Dealing with problems can reflect our value. If the project is smooth, what's the problem, does it turn into research?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.