Write in front of the crap:
Well, I have to say Fish C markdown Text editor is very good, full-featured. Again thanks to the little turtle Brother's python video Let me last year in the next semester of the introduction of programming, fell in love with the programming of the language, because it is biased statistics, after the internship decided to put the direction of data mining, more and more found the importance of specialized courses. In the days when everyone was busy attending various training sessions, I finished reading the turtle Brother's Python video in the cold winter of last year. Now, in the days when others desperately participate in school recruitment, I came to learn the "machine learning" inside the algorithm (PS: Engineering school Science sister said it is difficult to find data analysis work , only a graduate student). OK, I will not superstitious, just opened a continuous two months of Professor Ng's Coursera above the "Machine learning" course (inside the assignment is very simple, using MATLAB to complete), just the laboratory purchase a "machine learning combat", also take to practice practiced hand, Let your own python step by step, before a variety of web background toss, especially reptiles, but I do not want to help others crawl data, I want to analyze data, mining potential information, the program is a tool, master the business trend is the King!
No nonsense, the next series of notes are my coursera above the understanding, according to their handwriting and "machine learning Combat" The Code of the book, I hope I can not get used to the update blog this thing to stick it down. Come on!
Body:
In the past two years, many people have heard of "big Data", and machine learning is quietly entering the field of data mining. Of course, foreign data mining is very mature, the scope of machine algorithm application is more extensive, including: Network search, mail classification, robotics, Biology and medicine research and so on.
Here are a few specific examples:
- Website data: You can understand the popularity of the product according to the website's click data;
- Medical data: According to the medical records to understand the patient's condition to facilitate diagnosis;
- Biological aspects: For example, gene DNA sequences can be used to study some traits of human beings and even genetic information;
- Engineering field: Guidance UAV autonomous operation, handwriting font recognition, NLP (Natural Language processing commonly known as "natural Language Processing"), as well as computer vision;
- Recommendation system: Amazon's product Recommendation system (this may seem to be divided into the site data).
So long-winded so much, what is machine learning?
There are two kinds of definitions here:
- Popular point: Research allows the machine to have the same learning ability, the ability is not fixed programming implementation or operation, belongs to the machine itself a self-learning behavior.
- Academic point: Through experience E, for some task T, design a computer program, the program has a specific performance indicators p, the purpose of the program is based on historical experience e's continuous accumulation in the task T to improve its performance indicators p.
Academic is academic, too blunt, but also I do not grind one of the reasons, so boring~ popular example:
Next Checkers:
E = Experience gained from playing multi-board checkers
T = Checkers itself is a task
P = Possibility of the program winning the next checkers
Machine learning consists mainly of two tasks: Classification and regression. The former is very easy to understand, is to classify the data in a prediction task, the latter regression is mainly statistical sense, used to predict the data, the students who have done mathematical modeling are quite familiar with the fitting curve; Yes, it's a very important task in the regression--The data fitting curve: fitting the optimal curve with a given set of data, So that the curve as far as possible to reflect the trend of data, in the case of not over-fitting can let a given data set near the online (upper). Machine learning includes "supervised learning" and "unsupervised learning", so classification and regression belong to "supervised learning". Next, the focus of this article is to distinguish between "supervised learning" and "unsupervised learning", the following article is divided into the two learning, and even the return and classification of the details of the number is not very few.
Example 1: House price forecast (linear regression)
Let's say you have a pile of house prices and room-size data that allows you to estimate prices based on the size of your house, and then you get it based on the data (it's ugly, not allowed to spray).
You fit straight lines and curves according to the distribution of the data, and the two fitted lines are predicted to get Y1 and Y2 respectively at the point of X1, so the different curves correspond to different prediction results. So, why do I say that the price forecast here is a kind of "supervised learning"? Because a definite answer is given, that is, in the data set, the size of the different houses corresponds to different prices. In other words, this type of algorithm clearly knows what it predicts (in this case, the price forecast) and the target variable is very clear.
The above problems are also referred to as regression problems: predicting successive output values.
Example 2: Prediction of tumor cancer: benign and malignant tumors (logistic regression)
The "X" symbol represents the data set, referring to whether the tumor size corresponds to a malignant tumor (1), if it is a malignant tumor, then the corresponding value 1; This is a typical two-valued problem, also known as a logistic regression problem, commonly used for classification: discrete output values (0 or 1).
Of course, in the actual prediction, whether the tumor is malignant judgment needs to rely on many properties, such as: tumor block thickness, cell shape and so on, and the factors affecting the size of the tumor is also many, such as age and so on. So many properties, if all in the way of drawing to fit the data, it appears to be relatively inefficient, so we introduced the "vector machine", we will discuss this issue in the future, interested can Google a bit.
As the name implies, there is no given correct answer.
First:
Simple is to give a bunch of data, such as a small black circle to represent the data set, so you find the structure of these data features, that is, clustering (so-called: Birds of a Feather, flock together). Obviously, you do not have the standard answer, so you can both the data in accordance with the Red Oval Poly 2, but also in accordance with the purple lines around the range of clustering into 3 classes, you can follow the blue square ring as 2 class, no one said you this kind of clustering is wrong, as long as you say your reason.
Unsupervised learning, it seems, is unreasonable, but it can be used in a wide range of applications: organizing computer clusters, social network analysis, market share segmentation, and astronomical data analysis. In the future big data need to be explored too many things, unknown is often unfathomable, so unsupervised learning the subject of "water" is quite deep ~
Well, for the moment introduced here, Swaiiow to take a nap, the afternoon also to Courera course, the fourth week, welcomed the interest of friends and I become classmates ~
preview: linear regression and gradient descent algorithm.
Python machine learning "Getting Started"