Features of machine learning
Machine learning is a discipline of computer-based probabilistic statistical models of data construction and the use of models to predict and analyze data. Its main features:
- Built on computers and networks
- Data-driven discipline is the research object
- The goal is to predict and analyze the data.
- Modeling and analysis of data using the model in a method-centric way
- Machine learning is a cross discipline in many fields, such as probability theory, statistics, information theory, computational theory and optimization theory.
Object of machine learning
Object data for machine learning. It starts from the data, extracts the characteristic of the data, abstracts the model of the data, discovers the knowledge of the data, and goes back to the analysis and forecast of the data. Machine learning the basic assumption about data is that similar data has certain statistical regularity, which is the premise of machine learning.
The purpose of machine learning
The purpose of machine learning is to predict and analyze data, especially to predict and analyze unknown new data.
The method of machine learning
Machine learning can be divided into:
- Supervised learning
- Unsupervised learning
- Semi-supervised learning
- Intensive Learning
Three elements of machine learning:
- Model: The first consideration in machine learning is model, in supervised learning, the model is the conditional probability distribution or decision function to be studied.
- Strategy: What criteria to learn or choose the optimal model
- Algorithm: The specific calculation method of learning model
Loss function and risk function
In machine learning, the loss function is used to measure the quality of the model prediction, and the model prediction is good or bad under the mean value of risk function measurement.
The smaller the loss function, the better the model, and the common loss functions are:
- 0-1 loss function
- Square loss function
- Absolute loss function
- Logarithmic loss function
The risk function is the loss of the theoretical model to the mean value of the joint distribution, also known as the expected risk. Empirical risk (average loss of model relationship Training sample set) is generally used to estimate the expected risk in practical learning. There are two basic strategies involved: minimizing the risk of experience and minimizing structural risk.
Training error and test error
- Training error is the average loss of the model about the training set, its size is meaningful for judging the given problem is not an easy learning, but it is not important in nature.
- The test error reflects the predictive ability of the model to the unknown test data set, which is an important concept in machine learning.
Over fitting
Over-fitting refers to the model's well-predicted known quantity, but it is poorly predicted for the unknown.
Cross-validation
The basic idea of cross-validation is to use data repeatedly, slice the given data, combine the segmented data set into training set and test set, on the basis of which the training, testing and model selection are repeated. Cross-validation can be divided into:
- Simple cross-validation
- S-fold cross-validation
- Leave a cross-validation
Generate Models and discriminant models
The supervised learning method can be divided into generation method and discriminant method, and the model is generated model and discriminant model respectively.
Generation method by data learning joint probability distribution P(X,Y) And then find out the conditional probability distribution p ( y | X ) As a predictive model, that is, the model is generated. The typical generation model has naive Bayesian model and hidden Markov model.
Discriminant method by data Direct learning decision function f ( x ) or Conditional probability distribution p ( y | X ) As a predictive model, that is, discriminant model. Typical discriminant models are: k Neighbor method, Perceptron, decision tree, logistic regression, maximum entropy model, support vector machine, conditional random field and so on.
Machine learning Note (i): Introduction