How to learn data mining in a systematic way

Source: Internet
Author: User
Tags svm

Look at the algorithm theory of business intelligence software data mining often feel some formula derivation process such as Heavenly Book general, for example, look at the mathematical proof of SVM, EM algorithm:, the sense of knowledge jumps relatively big, then the data mining system learning process is how?

Ax There are a few things you should know before you learn data mining:

    • Data mining is not yet popular in China, like disappointing.

    • Initial data preparation usually accounts for about 70% of the total data mining project's workload.

    • Data mining itself incorporates disciplines such as statistics, databases, and machine learning, and is not a new technology.

    • Data mining technology is better suited for business people learning (more efficient than a technician learning business)

    • Data mining is suitable for areas where traditional bi (reports, OLAP, etc.) cannot be supported.

    • Data mining projects often require duplication of work with no technical content.

If you read the above and feel that you can accept it, continue looking down.

Learning a technology to be close to the industry, no industry background technology such as castles in the castle. Technology, especially in the field of computer technology development is broad and rapid replacement (ten years ago to do web design can set up a company), the average person does not have this energy and time to master all aspects of technical details. But technology can be independently after the combination of industry, on the one hand to seize the user pain points and rigid demand, on the other hand can accumulate industry experience, using the Internet thinking cross-border make you more easy to achieve success. Don't try to be comprehensive when you're learning technology, so you lose your core competencies.

First, the current domestic data mining personnel work area can be broadly divided into three categories.

    • Data analyst: Business consulting, business intelligence, and analysis report in the industry with industry data in e-commerce, finance, telecommunications, consulting and other industries.

    • Data Mining Engineer: In the multimedia, e-commerce, search, social and other big data -related industries to do machine learning algorithm implementation and analysis.

    • Scientific research direction: in universities, research units, enterprise research institutes and other high-level scientific research institutions to study the new algorithm efficiency improvement and future application.


Second, talk about the skills required in each area of work.
(1). Data Analyst

    • A deep mathematical and statistical basis is needed, but the ability to develop the program is not required.

    • Requires skilled use of mainstream data mining (or statistical analysis) tools such as Business Analytics and business Intelligence Software (SAS), SPSS, Excel, etc.

    • There is a need for in-depth understanding of all core data related to your industry, as well as a certain degree of data sensitivity training.

    • Classic book Recommendations: "Probability and Mathematical Statistics", "statistics" recommended David Freedman Edition, "Business Modeling and data Mining", "Introduction to Data Mining", "SAS programming and data mining business case", "Clementine data mining methods and Applications", "Excel 2007 VBA reference Daquan, "IBM SPSS Statistics statistical procedures Companion" and so on.

(2). Data Mining Engineer

    • Need to understand the principles and applications of mainstream machine learning algorithms.

    • Need to be familiar with at least one programming language (Python, C, C + +, Java, Delphi, etc.).

    • The need to understand the principles of the database, proficiency in at least one database (Mysql, SQL, DB2, Oracle, etc.), to understand the principle of mapreduce operations and skilled use of Hadoop series tools better.

    • Classic Book Recommendation: "Data mining concept and technology", "machine learning Combat", "artificial intelligence and its application", "Introduction to Database System", "Introduction to Arithmetic", "Web Data Mining", "Python Standard library", "Thinking in Java", "thinking in C + +" , "Data structure" and so on.

(3). Direction of scientific research

  • It is necessary to study the theoretical basis of data mining, including association rule Mining (Apriori and Fptree), classification algorithm (C4.5, KNN, Logistic Regression, SVM, etc.), clustering algorithm (Kmeans, spectral Clustering). The target can first thoroughly understand the usage and advantages and disadvantages of the data mining 10 algorithms.

  • Compared to SAS, SPSS, R language is more suitable for researchers the R Project for statistical Computing, because R software is completely free, and the open Community environment provides a variety of additional toolkit support, more suitable for statistical calculation analysis research. Although the current domestic prevalence is not high, but strongly recommended.

  • You can try to improve some of the main algorithms to make it faster and more efficient, for example, to implement the SVM cloud algorithm call platform--web Engineering to call the Hadoop cluster under the Hadoop platform.

  • It takes a wide and deep reading of the world famous conference papers to track hot technologies. such as kdd,icml,ijcai,association for the advancement of Artificial INTELLIGENCE,ICDM and so on; Data mining related issues: ACM transactions on Knowledge Discovery from Data,ieee transactions in knowledge and Data engineering,journal of machine learning in the Hom EPAGE,IEEE Xplore:pattern Analysis and Machine Intelligence, IEEE transactions on, etc.

  • You can try to participate in the data mining competition to cultivate the ability to solve practical problems in all aspects. such as Sig KDD, Kaggle:go from Big Data to Big analytics.

  • You can try to contribute your own code for some open source projects, such as Apache mahout:scalable machine learning and data mining, Myrrix (specifically, you can find more fun items on SourceForge or GitHub).

  • Classic book Recommendation: "Machine learning" "Pattern Classification" "The Essence of Statistical learning theory" "Statistical Learning Method" "Data mining practical machine learning Technology" "R Language Practice", English quality is the necessary for scientific research talents "machine learning:a probabilistic Perspective "Scaling up machine learning:parallel and distributed approaches" Data Mining Using SAS Enterprise miner:a Case Study Approach "Python for Data analysis".


third, the following is the communication industry data mining engineers working experience.

Really from the point of view of data mining project practice, the communication ability is the most important to the interests of digging, have a hobby can be willing to study, have a good communication ability, can correctly understand business problems, can correctly turn business problems into mining problems, before the relevant professionals can clearly express your intentions and ideas To gain their understanding and support. So I think communication skills and hobbies are the core competitiveness of personal data mining, it is difficult to learn, and other relevant professional knowledge who can learn, not the core competitiveness of personal development.

When it comes to a lot of data warehouse experts, programmers, statisticians and so on to throw bricks, I'm sorry, I have no other meaning, your professional for data mining is very important, everyone is a whole, but as a single individual person, limited energy, limited time, can not be mastered in these fields, In this case, choosing the most important core, I think it should be data mining skills and related business ability Bar (from another extreme example, we can see, such as a mini-mining project, a person who knows marketing and data mining skills should be able to do.) Although he does not understand the data warehouse, but the simple Excel is sufficient for high-60,000 samples of data processing; although he does not know the professional display of skills, but as long as he can understand the line, this does not need to show the display; As I said before, statistical skills should be mastered, which is important for a person's mini-project. He does not know programming, but professional digging tools and digging skills are enough for him to practice, so in the mini-project, a person who knows how to dig skills and marketing business ability can be completed successfully, even in a data source according to business needs can be endless mining different project ideas, ask is this mini project, A simple data Warehouse expert, a simple programmer, a simple display of the technician, or even a simple excavation technology experts, are not competent. This from another aspect also explains why the importance of communication skills, these completely different areas of expertise, want to effectively organically integrated data mining project practice, you say there is no good communication skills, OK?

Data mining ability can only be promoted and sublimated in the furnace of project practice, so it is the most effective shortcut to follow the project. The people who study and dig abroad are all beginning to follow the boss to do the project, just start do not understand does not matter, the more do not understand the more know what should learn, can learn the faster the more effective. I do not know the domestic data mining students how to learn, but from some online forums to see, many are on paper, this is a waste of time, very inefficient.

In addition, the domestic concept of data mining is very chaotic, a lot of BI is limited to the presentation of the report and simple statistical analysis, but also known as data mining, on the other hand, the real scale of domestic implementation of data mining industry is one of the few (banks, insurance companies, mobile communications), the application of other industries can only be considered small-scale , for example, many universities have some related excavation project, excavation projects, but are more scattered, and are in the exploratory phase, but I believe that data mining in China is a good prospect, because this is the historical development of the inevitable.

Talk about mobile practice cases, if you are from the mobile, you must know that there is a home to China's analysis of the company (stating that I do not have any relationship with this company, I just stand in the perspective of data mining analysts have analyzed most of China's alleged data mining services companies, I think the Chinese courtyard is good, More practical than many illusory big companies, their business now covers the vast majority of China's provincial mobile company's analytical mining projects, you can search the Internet to find some detailed information. My impression on the Chinese Academy is the most impressive point is 2002 years this company from scratch, I do not know it does not matter, while self-learning side began to expand customers, to the present in China's mobile communications market full bloom, indeed admire admire AH. They start with Excel processing data, with the naked eye to compare different models, you can imagine the difficulty of it.

As for the application of specific data mining of mobile communication, that is too much, such as the formulation of different call plans, customer churn model, different service cross-selling model, different customer's elasticity analysis of the preferential, Customer group subdivision model, different customer life cycle model, channel selection model, malicious fraud early warning model, too many, remember, From the customer's demand, from the practice of the problem, moving can find too many mining projects. Finally tell you a secret, when your data mining ability to a certain extent, you will find no matter what industry, in fact, data mining applications are mostly coincident similar, so you will feel more relaxed.

How to learn data mining in a systematic way

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.