Big Data Bubble era: It's time to get back to the big data.

Source: Internet
Author: User
Keywords Survey very very time data mining

  

With the advent of large data age, there is a bubble in society's understanding and evaluation of data talents, and it is time to return to rationality. From the intelligence network recruitment information to see, many companies recruit advanced data analysis, have made a similar request: skilled use of SAS, SPSS, R and other tools. These software are statistical software, which is the algorithm in the last century did not understand the company's business out of the people. Since it is the knowledge of the field of statistics, why do we particularly emphasize this part of knowledge? Are other knowledge of lesser importance? The company doesn't need it?

Many companies recruit advanced data analysis, Alibaba data analysis expert Lu Hui wrote the book has similar problems. Many people have halo effect (they think because of Ali's data, so Ali's data analysis experts are all right, in fact, Ali Development, is the whole team for many years to come out). At present, some relatively intelligent people slowly from this bubble from the vague feel inappropriate, and I was reading statistics, because dare to tell the truth let me have been two of Chinese people to be valued and taught. Experience is not an article can be made clear, I only say that I Lu Hui Alibaba data analysis expert book "Data Mining and data Operation Combat", hereinafter referred to as LU book some views, hope to promote the society to the data personnel's understanding more rational.

The author looked at the end of last year Lu book, some places with the author has a consensus, but the book also has a lot of problems.

For example, we all know that the same dizziness, the root of the root may be different, so the students to study the whole subject of medicine to learn, internship to all the departments to go. If the doctor's knowledge is not wide enough, it is easy to be misdiagnosed. If you agree with the above example, then the statistical aspect, the knowledge is not wide enough to have the question, this conclusion everybody can understand.

For example, on page 17th of Lu Shu, "Data mining does not require a special professional statistical background as a requirement, but it needs to be emphasized that basic statistical knowledge and skills are essential". What is basic? It's basic to know the law. Statistical Law stipulates that the authority of statistics is investigation, report, supervision, the State emphasizes the investigation, not the statistical analysis. The 2nd chapter refers to the difference between statistical analysis and data mining and the contents of the book, see Lu Book author's understanding of statistics only stay is statistical analysis. What's wrong with that?

6th Chapter Data Mining Project complete application case demonstration, mention of a company with the loss of users, it is natural to think of the reasons for the investigation, some reasons can be analyzed by the log of user behavior data can know the approximate problem, or the company does not have relevant data, need to do research, including market research or business investigation. Whether or not there is any relevant user behavior data, it falls within the scope of statistics.

But Lu Shu in the 6th chapter of the method, waste a lot of manpower and resources, but did not get everyone really concerned about the answer. The practice described in the book is: "This case is mainly focused on 3 aspects: 1, the model put into use in advance to lock the high risk of loss of high active user groups; 2, can be found in the modeling process of valuable, most likely to impact the loss of important fields and indicators selectively provided to the operators; 3, For the impact of loss of core indicators and fields, can be provided to the business side, as a reference clue. "That is to spend a lot of time and labor costs but not directly answer the reasons for the loss, for no relevant data, do not understand the investigation and do not want to do the investigation of people say that this is not the scope of their work."

In addition, Lu book cover to write "business as the core, to train of thought as the focus, to mining technology as the assistant", this I agree, but the contents of the book in many ways to violate this truth. For example, the business analysis and reporting should be logical and readable according to the term "business as the core and focus on ideas". But the 17th page of Lu's book mentions "neural network mining technology, the hidden layer inside it is a black box, and no one can read it in all the cases. In practice, this often confuses analysts or business people who are accustomed to statistical analysis," as long as the model correctly predicts customer behavior "" Business unit, Why should the operations department not understand the technical details? In accordance with the "business as the core, to focus on the idea that the calculation does not conform to the business logic of the situation is to choose other methods to achieve, but LU book using the" Mining technology-oriented, supplemented by ideas "approach, so long as the correct prediction of user behavior to make people feel that this feasible. Imagine if the black box algorithm predicted the results of the problem, easy to check the problem and solve it?

Faced with business people not understanding the calculations they use, the approach mentioned on page 59th of Lu Shu is that the "business team" should have "an analytical report that understands the data analyst." This once again gives an excuse for an indefinable problem. The real "business as the core, the idea of focus" approach is to require data analyst reports to make the business team to understand. Statistical law stipulates that the authority of statistics is investigation, reporting and supervision. The report at the very least is to make others understand, readable. Lu Shu the logic upside down. Emphasis on the use of SAS, SPSS, R and other tools for analysis of data analysts, data mining, their reports are biased in favor of Lu Shu mentioned, or even simply do not write a report.

Data analysis, data mining is emerging in recent years, they use only a small part of the statistical knowledge of the content of the Internet needs, but the other knowledge of statistics is useless? is the society's evaluation of the data personnel high or the evaluation of the statistics high? The Census Bureau is supposed to be a household knowledge, Why many data people do not want to mention, and even want to draw a line with the investigation. Interviewed a lot of company's data analysis, they say they like statistics, when in-depth ask, originally they only like data analysis that part of the work, which reflects the social status quo and education problems. They only do the statistical work, the reporting function of the data analysis work, but ask the society to pay only a small part of the statistics?

How hard the investigation is, you know. The article for the author independent opinion, does not represent the tiger smelling net stand

This paper was issued by Jensen111, and was edited by Tiger sniffing net. Reprint this article must be approved by the author, and please attach the source (Tiger sniffing net) and link to this page. Original link http://www.huxiu.com/article/38285/1.html to the author to join the anthology

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.