PS: This blog content based on Tang Yudi's machine learning Classic algorithm Learning video replication summary and Http://www.abcplus.com.cn/course/83/tasks
Logistic regression
Problem Description: We will establish a logistic regression model to predict whether a student is enrolled in a university. If you are an administrator of a university department, you want to determine the admissions opportunities for each applicant based on the results of the two exams. You have historical data for previous applicants and you can use it as a training set for logistic regression. For each training example, you have two exams for the applicant's score and admission decision. To do this, we will establish a classification model that estimates the probability of admission based on exam results.
Data download: Https://pan.baidu.com/s/1pNbtrjP
The data probably looks like this.
1. View Data Basic Properties
2, drawing observation plot structure analysis
From https://www.jianshu.com/p/b4b5dd20e48a
3, the establishment of the classifier (three parameters to solve θ0,θ1,θ2)
Set thresholds to determine admissions results based on thresholds (set as 50%,≥50% here to determine admissions)
to complete the module :
- sigmoid
: function mapped to probability
model
: Returns the predicted result value
cost
: Calculates the loss according to the parameters
gradient
: Calculates the gradient direction for each parameter
descent
: Make parameter Updates
accuracy
: Calculation accuracy
4. Loss function
5. Calculate gradient
6. Gradient descent (comparison of three gradient descent methods)
The following code is just a visualization of the process
7, different stop policy ① set the number of iterations
② stop based on loss value
The number of iterations above is too low, the modification threshold is 1E-6, and the number of iterations takes approximately 110,000 times. Will find that the value is reduced again
This strategy, although more accurate, but the number of iterations, a large amount of computation
③, stop according to gradient change
Set the threshold value of 0.05, which requires approximately 40 000 iterations
8. Comparing different gradient descent methods ①stochastic descent random gradient descent
Quite unstable, try to turn the study rate down a little bit.
The speed is fast, the effect and the stability are poor, need very small study rate
②mini-batch descent small batch gradient descent
Normalization/Normalization
Floating is still relatively large, let's try to standardize the data by subtracting its mean value by its attributes (in columns) and then dividing by its variance. Finally, the result is that all data is aggregated around 0 for each attribute/column, with a variance value of 1.
It's much better! Raw data can only reach up to 0.61, and we get 0.38 of them here! Therefore, it is very important to preprocess the data.
More iterations will cause the loss to fall even more!
The random gradient drops faster, but we need to iterate more times, so it's better to use batch instead!
9. Accuracy
[Python] Data Mining (1), Gradient descent solution logistic regression--Classification of examination scores