The ability to give computer learning data
Cover:
1. General concepts of machine learning
2. Three types and basic terminology of machine learning methods
3. Modules required to successfully build a machine learning system
Three different methods of machine learning
1. Supervised learning
2. Unsupervised Learning
3. Intensive Learning
Predicting future events through supervised learning
The main purpose of supervised learning is to build models using training data with classes, and we can use the trained models to predict future data. In addition, terminology supervision means that each sample in the training data set has a known output item (Class label label)
1. Using classifications to predict class labels
Classification is a subclass of supervised learning that is designed to predict the new sample class based on the observation and learning of previously known examples of class markers. These class marks are discrete, unordered values that they can treat as group information for a sample.
2. Using regression to predict continuous output values
Solve interactive problems with intensive learning
The goal of reinforcement learning is to build a system (Agent) that improves the performance of the system in the process of interacting with the environment (environment). The current state information for an environment usually contains a feedback (reward) signal, and we can consider reinforcement learning as an area related to supervised learning. However, in reinforcement learning, this feedback value is not a definite class or continuous type of value, but a feedback function generated by the current system behavior evaluation. By interacting with the environment, the agent can obtain a series of behaviors through intensive learning, through exploratory trial-and-error or by means of a well-designed incentive system to maximize positive feedback.
Discover the potential structure of the data itself through unsupervised learning
1. Cluster discovery of data by clustering
2. dimensionality reduction in data compression
Blueprint for building machine learning systems
1. Data preprocessing
To maximize the performance of the machine learning algorithm, there are some specific requirements for the format of the original data, but the original data rarely achieves this standard. Therefore, data preprocessing is one of the important steps in the process of machine learning application.
There may be a high correlation between some attributes, so there is some data redundancy. In this case, it is useful to compress data into a sub-space of a relatively low dimension using data reduction techniques. Data dimensionality reduction not only makes the required storage space smaller, but also enables the learning algorithm to run faster.
2. Select the predictive model type and train
There are many different large machine learning algorithms that are used to solve different problems. "No free lunch in the world," we can summarize the important point: we can't really use the learning algorithm for free. For example: Any classification algorithm has its inherent limitations, if we do not pre-set the classification task, no one classification model will be more advantageous than other models.
3. Model validation and use of unknown data and prediction
After building a model using a training dataset, you can test the model with a test data set, predict the performance of the model on unknown data, and evaluate the generalization error of the model. If we are satisfied with the evaluation results of the model, we can use this model to predict future new unknown data. It is important to note that the parameters required in the previous steps of feature scaling, dimensionality reduction, etc., can only be obtained from the training data set and can be applied to test datasets and new data samples, but the performance evaluation of the model only on the test set may not be able to detect whether the model has been overly optimized.
"Python Machine learning" notes (i)