Logistic regression is often used for classification problems, the simplest of which are two classification questions: Is it junk mail? Is the game winning or losing?
For linear regression problems, Z = w0*x0+w1*x1+w2*x2+ ...
Generally, through the least squares learning parameter w to predict the size of Z when given an X value, its value range (-∞,+∞), and for the classification problem, it is obvious that the predicted values are discrete, by introducing the S function to reduce the range y to (0,1), this way,
When y>=0.5, can be divided into positive cases
When y<0.5, can be divided into negative examples. So the prediction problem is transformed into a classification problem.
Then the predictive function is written
where z=ω.t x, ω is the parameter column vector, x is the sample vector
The probability that the sample XJ is a positive example can be expressed as
Import NumPy as NP def Predict (X,W): return 1.0/1.0+np.e** (-x.dot (w)))
Square loss function
def Cost (x, Y, W): = y.size = predict (x,w) = prediction- y = (1.0/(2.0*m)) * error. T.dot (Error) return Co
Now the question is how to find out W, and then look back to see the expression can be found in the W, when we think that if the loss function smaller, then our prediction will be better, and the loss function is a convex function, convex function has a global optimal solution, this looks better to do, you can consider the SGD method, The minimum value is gradually calculated by iteratively updating W.
α is the step, also known as the learning rate, the factor next to alpha is the gradient value calculated by the loss function.
def iter_w (x, Y, A, W):
= Predict (x,w) = (prediction-y) * x = w+ A * g * (1.0/ y.size) ret Urn W
Iteration, Max_epochs represents the iteration algebra
While counter < Max_epochs: + = 1 for in Range (len (Y)): = Update (X[i,:], Y[i], A, W)
In the actual learning, we need to test the effect of different step size on learning results, and then choose the appropriate step size.
Import Kfold
SGD implements logistic regression