I am learning Java, want to try big data and data mining, how to plan learning?

Source: Internet
Author: User
Tags svm uci machine learning repository

Copyright belongs to the author.
Commercial reprint please contact the author for authorization, non-commercial reprint please specify the source.
Tan Xin
Links: http://www.zhihu.com/question/21380122/answer/22156159
Source: Know

Big Data has two directions, one is computer-biased and the other is economy-biased. You've learned Java, so you can shot computer

Basis
1. Reading "Introduction to Data Mining", this book is very easy to understand, there is no complex advanced formula, very suitable for people to get started.
You can also use this book for reference "Data mining:concepts and Techniques". The second is thicker, but also a bit more knowledge of data warehousing.
If the algorithm is more like, you can read the Introduction to machine learning.
Of course, "machine Learning: Practical Case Analysis"

2. Implement the classic algorithm. There are several parts:
A. Association rules mining (Apriori, Fptree, etc.)
B. Classification (C4.5, KNN, Logistic Regression, SVM, etc)
C. Clustering (Kmeans, DBScan, spectral clustering, etc)
D. dimensionality reduction (PCA, LDA, etc.)
E. Recommender systems (Content-based recommendations, collaborative filtering, such as matrix decomposition, etc.)
Then test on the public data set to see how the implementation works. A large number of public datasets can be found on the following Web site: UCI machine learning repository/

3. Familiar with several open source tools: Weka (for getting started); LIBSVM, Scikit-learn, Shogun

4. Take a few 101 races on Kaggle:go from Big Data to big analytics/, learn how to abstract a problem into a model, and build effective features (Feature Engineering) from the original data.

At this point, the basic number of major domestic companies will give you the opportunity to interview.

Advanced article:

1. Reading, the following sections are voluminous, but the progress is very great.
A. "Pattern Recognition and machine learning"
B. The Elements of statistical learning
C. "Machine Learning:a Probabilistic Perspective"
The first one is more biased Bayesian; the second one is biased frequentist; the third one is between the two, but I think it's the same as the first one, but it adds a lot of new content. Of course, in addition to these chatty, there are many different areas of the book, such as "boosting foundations and Algorithms", "Probabilistic graphical Models principles and techniques, and some theoretical "foundations of machine learning", "Optimization for machine learning" and so on. These books are also very useful after-school exercises, so that they can write paper when writing the formula.

2. Read the paper. includes several related meetings: KDD,ICML,NIPS,IJCAI,AAAI,WWW,SIGIR,ICDM; and several related periodicals: Tkdd,tkde,jmlr,pami, etc. Keep track of new technologies and hot issues. Of course, if you do the relevant work, this step is necessary. For example, our group style is the first half of reading paper, summer vacation to find problems, autumn to do experiments, the Spring Festival about writing/investment papers.

3. Track hot issues. For example, in recent years recommendation System,social Network,behavior targeting and so on, many of the company's business will be involved in these aspects. And some hot technologies, such as deep learning, which are now very fire.

4. Learn techniques for massively parallel computing, such as MapReduce, Mpi,gpu Computing. These technologies are used by virtually every big company, because the amount of data in reality is very large and is basically achieved on a computing cluster.

5. Participate in actual data mining contests, such as Kddcup, or Kaggle:go from the big data to big analytics/. This process will train you on how to solve a real problem in a short period of time and be familiar with the whole process of data mining project.

6. Participate in an open source project, such as the Shogun or Scikit-learn mentioned above, as well as Apache Mahout, or provide a more efficient and fast implementation of some popular algorithms, such as implementing SVM under a Map/reduce platform. This is also the ability to exercise coding. Go

I am learning Java, want to try big data and data mining, how to plan learning?

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.