Topic Center

Contact Sales

Home > Hot Categories > Big Data

I am learning Java, want to try big data and data mining, how to plan learning?

Last Update:2015-10-26 Source: Internet

Author: User

Tags svm uci machine learning repository

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Copyright belongs to the author.
Commercial reprint please contact the author for authorization, non-commercial reprint please specify the source.
Tan Xin
Links: http://www.zhihu.com/question/21380122/answer/22156159
Source: Know

Big Data has two directions, one is computer-biased and the other is economy-biased. You've learned Java, so you can shot computer

Basis
1. Reading "Introduction to Data Mining", this book is very easy to understand, there is no complex advanced formula, very suitable for people to get started.
You can also use this book for reference "Data mining:concepts and Techniques". The second is thicker, but also a bit more knowledge of data warehousing.
If the algorithm is more like, you can read the Introduction to machine learning.
Of course, "machine Learning: Practical Case Analysis"

2. Implement the classic algorithm. There are several parts:
A. Association rules mining (Apriori, Fptree, etc.)
B. Classification (C4.5, KNN, Logistic Regression, SVM, etc)
C. Clustering (Kmeans, DBScan, spectral clustering, etc)
D. dimensionality reduction (PCA, LDA, etc.)
E. Recommender systems (Content-based recommendations, collaborative filtering, such as matrix decomposition, etc.)
Then test on the public data set to see how the implementation works. A large number of public datasets can be found on the following Web site: UCI machine learning repository/

3. Familiar with several open source tools: Weka (for getting started); LIBSVM, Scikit-learn, Shogun

4. Take a few 101 races on Kaggle:go from Big Data to big analytics/, learn how to abstract a problem into a model, and build effective features (Feature Engineering) from the original data.

At this point, the basic number of major domestic companies will give you the opportunity to interview.

Advanced article:

1. Reading, the following sections are voluminous, but the progress is very great.
A. "Pattern Recognition and machine learning"
B. The Elements of statistical learning
C. "Machine Learning:a Probabilistic Perspective"
The first one is more biased Bayesian; the second one is biased frequentist; the third one is between the two, but I think it's the same as the first one, but it adds a lot of new content. Of course, in addition to these chatty, there are many different areas of the book, such as "boosting foundations and Algorithms", "Probabilistic graphical Models principles and techniques, and some theoretical "foundations of machine learning", "Optimization for machine learning" and so on. These books are also very useful after-school exercises, so that they can write paper when writing the formula.

2. Read the paper. includes several related meetings: KDD,ICML,NIPS,IJCAI,AAAI,WWW,SIGIR,ICDM; and several related periodicals: Tkdd,tkde,jmlr,pami, etc. Keep track of new technologies and hot issues. Of course, if you do the relevant work, this step is necessary. For example, our group style is the first half of reading paper, summer vacation to find problems, autumn to do experiments, the Spring Festival about writing/investment papers.

3. Track hot issues. For example, in recent years recommendation System,social Network,behavior targeting and so on, many of the company's business will be involved in these aspects. And some hot technologies, such as deep learning, which are now very fire.

4. Learn techniques for massively parallel computing, such as MapReduce, Mpi,gpu Computing. These technologies are used by virtually every big company, because the amount of data in reality is very large and is basically achieved on a computing cluster.

5. Participate in actual data mining contests, such as Kddcup, or Kaggle:go from the big data to big analytics/. This process will train you on how to solve a real problem in a short period of time and be familiar with the whole process of data mining project.

6. Participate in an open source project, such as the Shogun or Scikit-learn mentioned above, as well as Apache Mahout, or provide a more efficient and fast implementation of some popular algorithms, such as implementing SVM under a Map/reduce platform. This is also the ability to exercise coding. Go

I am learning Java, want to try big data and data mining, how to plan learning?

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

data mining practical machine learning tools and techniques how to learn hadoop and big data quora linear algebra and learning from data amazon learning javascript data structures and algorithms learning php data objects numbersense how to use big data to advantage coursera learning how to learn

Big Data era: a summary of knowledge points based on Microsof... 11-05

Big Data Architecture Development Mining Analytics Hadoop HBa... 04-28

Big Data Architecture Development Mining Analytics Hadoop HBa... 12-02

0 Basic Learning Cloud computing and Big Data DBA cluster Arc... 02-21

"Big Data dry" implementation of big data platform based on H... 10-21

MYSQL Big Data Import 12-08

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

I am learning Java, want to try big data and data mining, how to plan learning?

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support