Computing is often used to analyze data, while understanding data relies on machine learning. For many years, machine learning has been very remote and elusive to most developers.
This is probably one of the most profitable and popular technologies now. No doubt--as a developer, machine learning is a stage that can be a skill.
Figure 1: The composition of machine learning
Machine learning is a reasonable extension of simple data retrieval and storage. By developing a variety of components to make the computer more intelligent learning and behavior.
Machine learning makes it possible to excavate historical data and predict future trends. You may not realize it yet, but you are already using machine learning and benefiting a lot. There are many examples of machine learning, such as search engine results, online referrals, advertising, fraud detection and spam filtering.
Machine learning relies on data for decision making. Intuition, though important, is hard to go beyond empirical data.
All aspects of machine learning
Once you start delving into machine learning, you will encounter the following questions:
1. Supervised and unsupervised learning
2. Classification
3. Markov model, Bayesian network, etc.
Mahout and Hadoop
The purpose of the Apache Mahout project is to build an extensible machine learning library.
There is a certain degree of overlap between large data analysis and Hadoop
With Hadoop, you can get the whole machine to learn open-source projects for free. More content See:
http://mahout.apache.org/
Mahout built-in clustering, classification and collaborative filtering algorithms. In addition to this:
1. Recommendation system based on matrix decomposition
2. K-Means, fuzzy K-means clustering algorithm
3. Implicit Dirichlet assignment algorithm
4. Singular value decomposition
5. Logic regression Classifier
6. (complementary) Naive Bayesian classifier
7. Random Forest classifier
I went to the University of California at Berkeley and found that they had a lot of good classes.
I hope to have more time. I thought about it and decided to start the MIT online course at the following address:
Http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-867-machine-learning-fall-2006/index.htm
Azure is the democratization of machine learning
Machine learning once required sophisticated software with high-end computers, as well as data scientists. For the current machine learning, that is, predictive analysis, what is needed is a fully managed cloud services.
Welcome to ML Studio
By using drag-and-drop (Drag-and-drop) and some data flow diagram can be carried out some experiments, such as writing code generally use a large algorithm.
Data scientists write code with R
For statistics and data mining, R is a popular open source project. The good news is that r can be easily integrated into ML studio. I have a lot of friends in the use of machine learning functional languages, such as F #. But obviously, R still dominates in this field.
The tests and surveys of data mining show that the popularity of R has been increasing in recent years. R was invented by Ross Ihaka and Robert Gentleman of the University of Auckland, New Jersey, and is currently being developed by R-Core team R Development, which Chambers is one of the development members. The name of R is mainly based on the initials of the first two R authors. R is a GNU project, mainly written in C language and Fortran.
Data analysis
The following framework provides a way to understand machine learning predictions. In general, it is when it comes to how to use limited resources to provide decision support to increase revenue or limit costs. Including forecasting consumption model, optimizing supply chain and so on.
How to analyze data
The best way to understand machine learning is to decompose the analysis into 3 questions:
1. What happened?
A from a historical point of view
2. What will happen?
A) predicting the future
3. What should be done next?
A) Norms and guidelines
What role do you play in the analysis process?
1. Information workers
A often use self-service tool power Bi:office 365 is a self service transaction intelligence solution that provides information workers with the ability to analyze and identify data deep transaction prediction visualization through BI Excel and Office 365.
2. It experts
(a) Data conversion, data warehousing, creation of data analysis cubes and data modeling
3. Data scientists
A deep level of technology and skills, including coding, math, statistics, and probability
b The probability can be used to forecast through a series of technologies (for example, the probability of price increase in the next 18 hours is 42%)
c) such as Monte Carlo (Monte Carlo) simulation, model parameterization
d The quality of data scientists
I. Domain knowledge
Ii. clear understanding of scientific methods: objectives, assumptions, validation, transparency
Iii. good at math and statistics
Iv. curiosity and strong ability to think
V. Graphical description and communication skills
Vi. Advanced Computing and data management capabilities
Academic background
If you want to enter the school and learn to become a data scientist, the following courses can be selected:
1. Applied Mathematics
2. Computer Science
3. Economics
4. Statistics
5. Engineering
Industries benefiting from data science include:
1. Financial Services
2. Telecommunications
3. Information Technology
4. Manufacturing
5. Public Utilities
6. Public Health
7. Market
"TechTarget China original content, copyright, by authorized China large data release, declined to reprint." Otherwise techtarget China will retain the right to pursue its legal liability. 】