What is
machine learning
1. Concept of machine learning
Machine learning is to learn part of the computer data, and then predict and judge some other data.
The core of machine learning is "using algorithms to analyze data, learn from it, and then make decisions or predictions on new data". That is to say the first mock exam is made by computer, and then it is used to predict a model. This process is similar to human learning process, for example, people can get some experience and predict new problems.
Let's take an example. We all know the "five lucky" activities of Alipay's Spring Festival. We use the mobile phone to scan the "Fu" word to identify the word Fu, which is a machine learning method. We can provide the computer with the photo data of the character "Fu". Through the training of the algorithm model, the system can constantly update and learn, and then input a new photo of "Fu", and the machine will automatically identify whether there is a "Fu" character on this photo.
Machine learning is an interdisciplinary subject, involving probability theory, statistics, computer science and other disciplines. The concept of machine learning is to train the model by inputting massive training data, so that the model can grasp the potential laws contained in the data, and then accurately classify or predict the new input data. As shown in the figure below:
2. Classification of
machine learning
We understand the concept of machine learning, through the establishment of models for self-learning, then what are the learning methods?
(1) Supervised learning
Supervised learning is to train the model of machine learning. The training sample data has corresponding target value. Supervised learning is to establish the connection between the data sample factors and the known results, extract the eigenvalue and mapping relationship, and predict the results of new data through the continuous learning and training of the known results and known data samples.
Supervised learning is often used in classification and regression. For example, mobile phone identification of spam messages, e-mail identification of spam, are through some historical SMS, historical e-mail spam classification marks, on these marked data model training, and then get a new SMS or new e-mail, model matching, to identify whether the e-mail is or not, this is the prediction of classification under supervised learning.
Let's take another regression example. For example, if we want to forecast the company's net profit, we can use the historical company's profit (target value) and the profit related indicators, such as operating revenue, assets and liabilities, management expenses and other data. We can return to a regression equation through regression, and establish the equation of company profit and correlation Factor data is used to forecast the company's profit.
The difficulty of supervised learning is the high cost of obtaining the sample data with the target value. The reason for the high cost is that these training sets rely on manual annotation.
(2) Unsupervised learning
The difference between unsupervised learning and supervised learning is that the selected sample data does not need to have a target value. We do not need to analyze the impact of these data on some results, but only analyze the inherent laws of these data.
Unsupervised learning is often used in clustering analysis. Such as customer clustering, factor dimensionality reduction and so on. For example, RFM model is used to cluster customer data through customer sales behavior (consumption times, recent consumption time and consumption amount)
Important value customers: recent consumption time, high consumption frequency and consumption amount;
Important customer retention: the recent consumption time is far away, but the consumption frequency and amount are very high, which indicates that this is a loyal customer who has not come for a period of time. We need to actively keep in touch with him;
Important development customers: those who have recently spent a lot of time and money, but with low frequency and loyalty, must focus on development;
Important retention of customers: users who have recently spent a long time and low frequency of consumption but have high consumption amount may be those who are about to lose or have already lost, and should be based on retention measures.
In addition, unsupervised learning is also suitable for dimensionality reduction. Compared with supervised learning, unsupervised learning has the advantages of no manual marking of data and low cost of data acquisition.
(3) Semi supervised learning
Semi supervised learning is a learning method which combines supervised learning and unsupervised learning. The method of semi supervised learning can realize the combination of classification, regression and clustering.
Semi supervised classification: it trains samples with class labels with the help of samples without class labels, and obtains better classification than samples with class labels only;
Semi supervised regression: the input with output is trained with the help of input without output to obtain better regression performance than that trained only with output input;
Semi supervised clustering: with the help of the information of samples with class labels, we can get better clusters than those without class labels and improve the accuracy of clustering method;
Semi supervised dimensionality reduction: find the low dimensional structure of high dimensional input data with the help of sample information with class label, while keeping the structure of original high-dimensional data and pairwise constraints unchanged.
Semi supervised learning is a popular method recently.
(4) Strengthen learning
Reinforcement learning is a more complex machine learning method, which emphasizes the continuous interaction and feedback between the system and the outside world. It mainly focuses on the scene that needs reasoning in the process, such as unmanned car driving, and it pays more attention to performance. It is a hot learning method in machine learning.