Big Data vs Computer

Source: Internet
Author: User
Tags svm uci machine learning repository


Big Data has two directions, one is computer-biased and the other is economy-biased. You've learned Java, so you can shot computer

Basis
1. Reading "Introduction to Data Mining", this book is very easy to understand, there is no complex advanced formula, very suitable for people to get started.
You can also use this book for reference "Data mining:concepts and Techniques". The second is thicker, but also a bit more knowledge of data warehousing.
If the algorithm is more like, you can read the Introduction to machine learning.
Of course, "machine Learning: Practical Case Analysis"

2. Implement the classic algorithm. There are several parts:
A. Association rules mining (Apriori, Fptree, etc.)
B. Classification (C4.5, KNN, Logistic Regression, SVM, etc)
C. Clustering (Kmeans, DBScan, spectral clustering, etc)
D. dimensionality reduction (PCA, LDA, etc.)
E. Recommender systems (Content-based recommendations, collaborative filtering, such as matrix decomposition, etc.)
Then test on the public data set to see how the implementation works. A large number of public datasets can be found on the following Web site: UCI machine learning repository/

3. Familiar with several open source tools: Weka (for getting started); LIBSVM, Scikit-learn, Shogun

4. Take a few 101 races on Kaggle:go from Big Data to big analytics/, learn how to abstract a problem into a model, and build effective features (Feature Engineering) from the original data.

At this point, the basic number of major domestic companies will give you the opportunity to interview.

Advanced article:

1. Reading, the following sections are voluminous, but the progress is very great.
A. "Pattern Recognition and machine learning"
B. The Elements of statistical learning
C. "Machine Learning:a Probabilistic Perspective"
The first one is more biased Bayesian; the second one is biased frequentist; the third one is between the two, but I think it's the same as the first one, but it adds a lot of new content. Of course, in addition to these chatty, there are many different areas of the book, such as "boosting foundations and Algorithms", "Probabilistic graphical Models principles and techniques, and some theoretical "foundations of machine learning", "Optimization for machine learning" and so on. These books are also very useful after-school exercises, so that they can write paper when writing the formula.

2. Read the paper. includes several related meetings: KDD,ICML,NIPS,IJCAI,AAAI,WWW,SIGIR,ICDM; and several related periodicals: Tkdd,tkde,jmlr,pami, etc. Keep track of new technologies and hot issues. Of course, if you do the relevant work, this step is necessary. For example, our group style is the first half of reading paper, summer vacation to find problems, autumn to do experiments, the Spring Festival about writing/investment papers.

3. Track hot issues. For example, in recent years recommendation System,social Network,behavior targeting and so on, many of the company's business will be involved in these aspects. And some hot technologies, such as deep learning, which are now very fire.

4. Learn techniques for massively parallel computing, such as MapReduce, Mpi,gpu Computing. These technologies are used by virtually every big company, because the amount of data in reality is very large and is basically achieved on a computing cluster.

5. Participate in actual data mining contests, such as Kddcup, or Kaggle:go from the big data to big analytics/. This process will train you on how to solve a real problem in a short period of time and be familiar with the whole process of data mining project.

6. Participate in an open source project, such as the Shogun or Scikit-learn mentioned above, as well as Apache Mahout, or provide a more efficient and fast implementation of some popular algorithms, such as implementing SVM under a Map/reduce platform. This is also the ability to exercise coding. Go

The following answer is a summary of the self before: want to engage in big data, mass data processing related work, how to teach themselves to lay the groundwork?
Working on data processing, especially big data processing, has to be Fundamentals and statistical Foundations of computer Science

If you have the opportunity to study these courses or self-study at school, it will help you with your career goals.

Basis in the foundation:
linear algebra, probability theory

Core Knowledge:
Mathematical Statistics
Predictive Models
Machine Learning

Computer:
    • Math software: Matlab for powerful matrix operations and optimization functions, dedicated and refined Mathematica.
    • Languages: Fortran (powerful computational language, fully optimized off-the-shelf code), R (compared to Matlab,java,c,r is a Gaofu), Python.
    • Visualization of
Statistics: Time series analysis
Applying regression analysis
Multivariate statistical analysis

Highly recommended: Distance Education§harvard University Extension School and Harvard students study Data science together.

Material after-school questions: CS109 Data Science

Related issues:

    • Data science:what is some good free resources to learn data science?
    • Where can I learn pandas or numpy for data analysis?
    • What is some good resources for learning on statistical analysis?
    • Data science:how do I become a data scientist?
    • What is some good resources for learning on machine learning?
In addition, this is my knowledge of the column, will continue to update the data science articles, welcome attention. Introduction to Dαγαsciεηce-dαγαsciεηce-Knowledge column

Big Data vs Computer

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.