Take a look at Daniel's data mining learning experience

Source: Internet
Author: User
Tags svm

Just a few, say something:

Basic article:

1. Reading "Introduction to Data Mining", this book is very easy to understand, there is no complex advanced formula, very suitable for people to get started. You can also use this book for reference "Data mining:concepts and Techniques". The second is thicker, but also a bit more knowledge of data warehousing. If the algorithm is more like, you can read the Introduction to machine learning.

2. Implement the classic algorithm. There are several parts:
A. Association rules mining (Apriori, Fptree, etc.)
B. Classification (C4.5, KNN, Logistic Regression, SVM, etc)
C. Clustering (Kmeans, DBScan, spectral clustering, etc)
D. dimensionality reduction (PCA, LDA, etc.)
E. Recommender systems (Content-based recommendations, collaborative filtering, such as matrix decomposition, etc.)
Then test on the public data set to see how the implementation works. A large number of public datasets can be found on the following Web site: http://archive.ics.uci.edu/ml/

3. Familiar with several open source tools: Weka (for getting started); LIBSVM, Scikit-learn, Shogun

4. Go to https://www.kaggle.com/for a few 101 races and learn how to abstract a problem into a model and build effective features from the original data (Feature Engineering).

At this point, the basic number of major domestic companies will give you the opportunity to interview.

Advanced article:

1. Reading, the following sections are voluminous, but the progress is very great.
A. "Pattern Recognition and machine learning"
B. The Elements of statistical learning
C. "Machine Learning:a Probabilistic Perspective"
The first one is more biased Bayesian; the second one is biased frequentist; the third one is between the two, but I think it's the same as the first one, but it adds a lot of new content. Of course, in addition to these chatty, there are many different areas of the book, such as "boosting foundations and Algorithms", "Probabilistic graphical Models principles and techniques, and some theoretical "foundations of machine learning", "Optimization for machine learning" and so on. These books are also very useful after-school exercises, so that they can write paper when writing the formula.

2. Read the paper. includes several related meetings: KDD,ICML,NIPS,IJCAI,AAAI,WWW,SIGIR,ICDM; and several related periodicals: Tkdd,tkde,jmlr,pami, etc. Keep track of new technologies and hot issues. Of course, if you do the relevant work, this step is necessary. For example, our group style is the first half of reading paper, summer vacation to find problems, autumn to do experiments, the Spring Festival about writing/investment papers.

3. Track hot issues. For example, in recent years recommendation System,social Network,behavior targeting and so on, many of the company's business will be involved in these aspects. And some hot technologies, such as deep learning, which are now very fire.

4. Learn techniques for massively parallel computing, such as MapReduce, Mpi,gpu Computing. These technologies are used by virtually every big company, because the amount of data in reality is very large and is basically achieved on a computing cluster.

5. Participate in actual data mining contests, such as Kddcup, or https://www.kaggle.com/above. This process will train you on how to solve a real problem in a short period of time and be familiar with the whole process of data mining project.

6. Participate in an open source project, such as the Shogun or Scikit-learn mentioned above, as well as Apache Mahout, or provide a more efficient and fast implementation of some popular algorithms, such as implementing SVM under a Map/reduce platform. This is also the ability to exercise coding.

To this step the domestic large companies basically want to where to go, and the treatment is not bad, if the English is good, go to us over the company is not very difficult.

Other Daniel's experience summarizes:

A:

In the actual project:

  1. The first is to clarify what you want to dig to generate business value, rather than what mining algorithm, that is the means, you can focus on the back, to be able to describe your excavation goals, values, and the expected presentation of the results of the excavation, how persuasive, and so on;
  2. Secondly, what data do you need to use to dig out the results you want, as discussed with the relevant small partners? Which of these data are already in place, and which ones have to be managed to collect? Is it possible that some of these data are not collected at all? How does the data you don't collect affect the results you want to dig? If it is a deadly effect that directly leads to a lack of convincing results for your excavation, then take a break and find another direction. Instead, arrange plans and resources to collect the data that can be collected as soon as possible;
  3. Thirdly, the data collected is cleaned according to the characteristics of the collected data and the quality of the collection process;
  4. According to the situation of the mining target and the characteristics of the collected data, the excavation plan is developed and the appropriate mining algorithm is selected.
  5. Then, start digging.
  6. What about the results after the first round? Does it make sense? Is it persuasive? Most of the situation, you will find, oh, dizzy, forget that these factors should be taken into account, but also to add in these aspects of data to see. OK, continue to the 2nd step, continue to collect data, cleaning, tuning algorithm/parameters, dug out and then evaluated, the general situation is so cyclic n rounds;
  7. So-so-so out of a nearly reliable, barely able to justify the initial embryo, the results look like that back.
  8. Summarize a statement (analysis of the results) out, for your statement, the data again targeted to wash a few times, give a cleaner analysis results, this version is basically persuasive.
  9. Pay attention to a little, and then draw an infographic or something, illustrated, you can initially hand in the homework;
  10. In a real project, there is one more step in choosing an important assessment perspective and indicators, according to the specific business characteristics of your analysis process into a weekly/daily/hourly to provide a fixed angle analysis of the service.
  11. Another step forward, if you are really familiar with this business, but also for different types of analysis results, can give the corresponding response measures (action), so that this mining business value is really clear. The job you're doing doesn't stop at the job, it's the level of decision support.
  12. B:
  13. Not a master, only a little experience.
    There are related mathematical accumulations, (numerical) linear algebra, statistics (multivariate, Bayesian), optimization. Read the good book, PRML,ESL, push the formula inside, experiment code implementation, the details of the article read. Take part in some projects or competitions.

Take a look at Daniel's data mining learning experience

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.