[Python] Data Mining (1), Gradient descent solution logistic regression--Classification of examination scores

Source: Internet
Author: User

PS: This blog content based on Tang Yudi's machine learning Classic algorithm Learning video replication summary and Http://www.abcplus.com.cn/course/83/tasks
Logistic regression

Problem Description: We will establish a logistic regression model to predict whether a student is enrolled in a university. If you are an administrator of a university department, you want to determine the admissions opportunities for each applicant based on the results of the two exams. You have historical data for previous applicants and you can use it as a training set for logistic regression. For each training example, you have two exams for the applicant's score and admission decision. To do this, we will establish a classification model that estimates the probability of admission based on exam results.

Data download: Https://pan.baidu.com/s/1pNbtrjP

The data probably looks like this.

1. View Data Basic Properties

2, drawing observation plot structure analysis

From https://www.jianshu.com/p/b4b5dd20e48a

3, the establishment of the classifier (three parameters to solve θ0,θ1,θ2)

Set thresholds to determine admissions results based on thresholds (set as 50%,≥50% here to determine admissions)

to complete the module :
- sigmoid : function mapped to probability

      • model: Returns the predicted result value

      • cost: Calculates the loss according to the parameters

      • gradient: Calculates the gradient direction for each parameter

      • descent: Make parameter Updates

      • accuracy: Calculation accuracy

4. Loss function

5. Calculate gradient

6. Gradient descent (comparison of three gradient descent methods)

The following code is just a visualization of the process

7, different stop policy ① set the number of iterations

② stop based on loss value

The number of iterations above is too low, the modification threshold is 1E-6, and the number of iterations takes approximately 110,000 times. Will find that the value is reduced again

This strategy, although more accurate, but the number of iterations, a large amount of computation

③, stop according to gradient change

Set the threshold value of 0.05, which requires approximately 40 000 iterations

8. Comparing different gradient descent methods ①stochastic descent random gradient descent

Quite unstable, try to turn the study rate down a little bit.

The speed is fast, the effect and the stability are poor, need very small study rate

②mini-batch descent small batch gradient descent

Normalization/Normalization

Floating is still relatively large, let's try to standardize the data by subtracting its mean value by its attributes (in columns) and then dividing by its variance. Finally, the result is that all data is aggregated around 0 for each attribute/column, with a variance value of 1.

It's much better! Raw data can only reach up to 0.61, and we get 0.38 of them here! Therefore, it is very important to preprocess the data.

More iterations will cause the loss to fall even more!

The random gradient drops faster, but we need to iterate more times, so it's better to use batch instead!

9. Accuracy

[Python] Data Mining (1), Gradient descent solution logistic regression--Classification of examination scores

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.