Fifth chapter: Logistic regression

Source: Internet
Author: User

Chapter Content

-sigmod function and logistic regression classifier

-Optimization Theory Preliminary
-Gradient descent optimization algorithm
- missing item processing in the data

This will be an exciting chapter, as we will be exposed to the optimization algorithm for the first time . If you think about it, you will find that we have encountered many optimization problems in our daily life, such as how to reach the point from the point of entry in the shortest time. How to invest the least amount of work but get the most benefit? How to design the engine to make the least fuel consumption and maximum power? Can be wind, the role of optimization is very powerful. Next, we introduce several optimization algorithms and use them to train a nonlinear function for classification. The reader is not familiar with the return, the 8th chapter will be in-depth introduction of this topic. Assuming there are some data points, we fit the points in a straight line (the line is called the best-fit line), and the fitting process is called regression . The main idea of using logistic regression to classify the classification boundary line is to set up the regression formula according to the existing data. The word "regression" here stems from the best fit, meaning that the mathematical analysis behind it will be described in the next section to find the best fitting set of parameters . The practice of training classifiers is to find the best fit parameters, using the optimization algorithm . Next, we introduce the mathematical principle of this binary output classifier.

This chapter first describes the definition of logistic regression and then introduces some optimization algorithms, including the basic gradient rise method and an improved random gradient rise method, which will be used for classifier training. At the end of this chapter, an example of logistic regression is given to predict whether a disease horse can be cured.

5.1 Classification based on logistic regression and sigmoid function

Advantages: The calculation cost is not high, easy to understand and realize.
Disadvantage: Easy to fit , the classification accuracy may not be high.

Applicable data types: numeric and nominal data.

The function we want is to be able to accept all the losers and then predict the category . For example, in the case of two classes, the above function outputs 0 or 1. Perhaps you have been exposed to a function of this nature, called the Heivissede-jump function (Heaviside step function), or directly called the unit-step functions. However, the problem with the Heivissede jump function is that the function jumps from 0 to 1 on the jumping point, which is sometimes difficult to process. Fortunately, the other function has similar properties and is mathematically easier to handle, which is the sigmoid function. The sigmoid function is calculated in the following formula:

Figure 5-1 shows the two graphs of the sigmoid function at different coordinate scales. When x is 0 o'clock, the sigmoid function has a value of 0.5. as the 1 increases, the corresponding sigmoid value is approximated to 1, and as the x decreases, the sigmoid value is approximated to 0. If the horizontal axisThe scale is large enough (figure 5-1), and thesigmoid function looks much like a step function.
therefore, in order to implement the logistic regression classifier , we can multiply a regression coefficient on each feature and then All the result values are added, and the sum is sigmoid to the function, resulting in a value that ranges between 0?1. renwhat is greater than 0.5 of the data is divided into 1 categories, less than 0.5 is attributed to the person 0 class. Therefore,logistic regression can also be seen as a generalizationrate estimates.
after determining the function form of the classifier, the question now becomes: what is the optimal regression factor? how to determine themthe size? These questions will be answered in the next section.

5.2 Determination of optimal regression coefficients based on optimization method

5.2.1 Gradient Ascending method

Note: (1) The coefficient W is the initial value , so the game can play.

(2) Step A is also a constant change in the

(3) These coefficients w,a vary with the number of iterations .

(4) There is a limit to the number of iterations . The whole process requires constant iteration to be able to play, and iteration is the energy and power to find the regression coefficients.

5.2.2 Training algorithm: Using gradient rise to find the best parameters

There are 100 sample points in Figure 5-3, each with two numeric characteristics: X1, X2. On this dataset, we will find the best regression coefficients by using the gradient rise method , which is the best parameter to fit the logistics regression model.

The pseudo-code for the gradient rise method is as follows:
Each regression coefficient is initialized to 1
Repeat R times:
Calculate the gradient of the entire data set
Updating vectors of regression coefficients using alpha x gradient
return coefficient of regression

The following code is a concrete implementation of the gradient ascent algorithm. To understand the actual effect, open a text editor and create a file named logregress.py, and lose the following code:

The variable alpha is the step that moves toward the target, Maxcycles is the number of iterations . After the For loop iteration is complete, a well-trained regression factor is returned. It should be emphasized that the operation at 2 is a matrix operation. The variable h is not a number but a column vector, the number of woodlands in the column vector equals the number of samples, here is 100. Correspondingly, the operation Datamatrix * weights more than once the product calculation, in fact, the operation contains 300 times the product.

5.2.3 Analyzing data: Drawing decision boundaries

Run the code in Listing 5-2, and at the Python prompt, lose the person:

5.2.4 Training algorithm: Random gradient Rise: Single point single point updating coefficient single point


5.3 Example: predicting mortality from hernia disease of the horse

This section uses logistic regression to predict the survival of a horse with a hernia disease. The data here contains 368 samples and 28 features. I am not a Breeders ' expert, and I have learned from some literatures that hernia is a term used to describe a horse's gastrointestinal pain. However, this disease does not necessarily originate in the horse's gastrointestinal problems, other problems may also cause equine hernia disease. This data set contains some indexes of hospital detection of equine hernia disease, some of which are subjective, and some indicators are difficult to measure, such as the level of pain in horses.

It is also necessary to note that, in addition to some of the indicators are subjective and difficult to measure, the data there is a problem, the data set in the The value of% is missing . The following first describes how to deal with data loss in a dataset, and then uses logistic regression and random gradient ascent algorithms to predict the life and death of a diseased horse.

5.3.1 Preparing data: Handling missing values in the

missing values in the data are a tricky issue, and there are a lot of literature devoted to solving this problem. So what is the problem with missing data? Suppose there are 100 samples and 20 features, and the data is collected by the machine. What should I do if one of the sensor damage on the machine causes a feature to be invalid? Do you want to discard the entire data at this point? In this case, what about the other 19 features? Are they still available? The answer is yes. Because sometimes the data is rather expensive, it is undesirable to throw away and regain it, so there are some ways to solve the problem.

Here are some optional practices:
-Use the mean value of the available features to fill the missing values;
-Use special values to fill missing values, such as-1;
- ignore samples with missing values;
-Fill missing values using the mean value of similar samples ;
-Use additional machine learning algorithms to predict missing values.

5.3.2 Test algorithm: Classification with logistic regression

The optimization algorithm is described in the previous sections of this chapter, but so far no actual attempts have been made on the classification. Using the logistic regression method to classify does not require much work, all that is required is to multiply each eigenvector on the test set by the regression coefficients obtained by the optimization method, and then sum the product results. The last loser into the sigmoid function , if the corresponding sigmoid value is greater than 0.5, the Prediction category label is 1, otherwise 0. take a look at the actual effect, open a text editor and add the following code to the logresion.py file.

5.4 Summary of this chapter

The purpose of logistic regression is to find the best fitting parameters of a nonlinear function sigmoid , and the solution process can be optimizedalgorithm to complete. In the optimization algorithm, the most common is the gradient rise algorithm , and the gradient rise algorithm can be simplified to randomgradient rise algorithm.
the random gradient ascending algorithm has the same effect as the gradient ascending algorithm, but consumes less computing resources . Furthermore, on random gradients L is an online algorithm that can be updated when new data arrives, without having to reread the entire data set to enterThe row batch operation.
an important problem in machine learning is how to deal with missing data. There is no answer to this question, depending on the actual applicationin the demand. There are a number of solutions available, each with its own pros and cons.
The next chapter introduces another classification algorithm similar to logistic regression: Support vector machines , which are considered to be the best currently one of the existing algorithms.

Fifth chapter: Logistic regression

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.