The related problems of logistic regression and Java implementation

Source: Internet
Author: User

This paper mainly introduces the related problems of logistic regression and the detailed realization method.

1. What is logistic regression

Logistic regression is one of linear regression, so what is regression and what is linear regression

Regression refers to the formula known, the unknown parameters in the formula is expected, note that the formula must be known, otherwise there is no way to return the

Linear regression refers to the one-time formula in regression, such as Z=ax+by

Logistic regression is actually a sigmoid function on the basis of linear regression, with a detailed look like the following


2. Regularization of items

The purpose of introducing a regularization term is to prevent the model from overfitting, and the function has three results for the sample fitting.

Under-fitting: the intuitive understanding is that the error ratio is larger on the training set, the function to be fitted should be a curve, and the result is to synthesize a straight line

Over fitting: The error in the training set is very small or even 0, the pursuit of experience risk minimization, model fitting is very complex, often in the unknown sample set performance is not good enough

Fit fit: Good performance in training set test set, the pursuit of empirical risk and structural risk balance

There are two ways to solve the problem of overfitting, one is to reduce the dimension of features and the other is to regularization. To reduce the dimension of the feature my understanding is that the reason for the overfitting is that too many samples are too few, so the feature selection to reduce the features will get a better fit effect, the following specific regularization.

First look at the regularization of the appearance


In fact, it is to add a regularization term in the loss function, the regularization term is the weight of the L1 or L2 norm multiplied by a lamda, to control the loss function and the proportion of the regularization term, the intuitive understanding, first of all to prevent the end of the fitting is to prevent the last training model too dependent on a certain feature, When the loss function is minimized, a dimension is very large, the gap between the fitted function value and the real value is very small, and regularization can make the overall cost larger, thus avoiding the result of relying too much on a certain dimension. Of course, the premise of home Plus is that the eigenvalues should be normalized, for example, the range of features is 200-500, there is a feature of the range is 0-1, this time will be normalized, such as 0-1.

3. Least squares and maximum likelihood method

Least squares, feel bad name, not at a glance, a bit of a mouthful, in fact, is the least square sum of meaning, then why use the least squares, we know that our goal is less than the difference between the predicted value and the real value, then directly add up the difference directly as the error is not good, of course not, Because the error has a positive negative, some errors will be offset, then the sum of the absolute value, and sound more reasonable, in theory, it should be able to, only the least squares have a reasonable explanation, there is a sample point D, and then very many candidate curve H to separate these points, then choose which line, We should choose the one with the most posterior probability, which is P (h| D) the largest line. Known by the Bayes P (h| d) is proportional to P (h) *p (D|h), the prior probability P (h) feels equal, so only to maximize P (d|h) is possible, because the sample point D is independent, so P (d|h)=p (d1|h) *p (d2|h) *......*p (dn|h). We think that these points are noisy, because the noise let him deviate from a perfect curve, a very reasonable if the probability of deviation from the larger, then the probability of deviation can be distributed to describe the narrative, formal expression of P (dn|h) =exp (-delta^2), so P (d| h) =exp (-(delta1^2+delta2^2......+deltan^2)), our aim is to maximize this probability, which is equivalent to minimizing the sum of squares inside, min (delta1^2+delta2^2......+deltan^2), Is it very familiar?

At this point, let's see if the least squares method is suitable for the error function of logistic regression, the answer is not suitable, because the error of the least squares is in accordance with the normal distribution, and the error of logistic regression conforms to the two-item distribution, so we can't use the least squares as the loss function, so we can use the maximum likelihood estimate to do

4. Java implementation Gradient Descent method

Experiment:

Sample:

-0.01761214.0530640-1.3956344.6625411-0.7521576.5386200-1.3223717.15285300.42336311.05467700.4067047.06733510.66739412.74 14520-2.4601506.86680510.5694119.5487550-0.02663210.42774300.8504336.92033411.34718313.17550001.1768133.1670201-1.7818719 .0979530-0.5666065.74900310.9316351.5895051-0.0242056.1518231-0.0364532.6909881-0.1969490.44416511.0144595.75439911.98529 83.2306191-1.693453-0.5575401-0.57652511.7789220-0.346811-1.6787301-2.1244842.67247111.2179169.5970150-0.7339289.0986870 -3.642001-1.61808710.3159853.52395311.4166149.6192320-0.3863233.98928610.5569218.29498411.22486311.5873600-1.347803-2.406 05111.1966044.95185110.2752219.54364700.4705759.3324880-1.8895679.5426620-1.52789312.1505790-1.18524711.3093180-0.4456783 .29730311.0422226.1051551-0.61878710.32098601.1520830.54846710.8285342.6760451-1.23772810.5490330-0.683565-2.16612510.229 4565.9219381-0.95988511.55533600.49291110.99332400.1849928.7214880-0.35571510.3259760-0.3978228.05839700.82483913.7303430 1.5072785.02786610.0996716.8358391-0.34400810.71748501.7859287.7186451-0.91880111.5602170-0.3640094.7473001-0.8417224.11908310.4904261.9605391-0.0071949.0 7579200.35610712.44786300.34257812.2811620-0.810823-1.46601812.5307776.47680111.29668311.60755900.47548712.0400350-0.7832 7711.00972500.07479811.0236500-1.3374720.4683391-0.10278113.7636510-0.1473242.87484610.5183899.88703501.0153997.5718820-1 .658086-0.02725511.3199442.17122812.0562165.0199811-0.8516334.3756911-1.5100476.0619920-1.076637-3.18188811.82109610.2839 9003.0101508.4017661-1.0994581.6882741-0.834872-1.7338691-0.8466373.84907511.40010212.62878101.7528425.46816610.0785570.0 5973610.089392-0.71530011.82566212.69380800.1974459.74463800.1261170.9223111-0.6797971.22053010.6779832.55666610.76134910 .6938620-2.1687910.14363211.3886109.34199700.31702914.7390250

Main code

public class Logregression {public static void main (string[] args) {logregression lr = new logregression (); Instances Insta NCEs = new Instances (); Lr.train (Instances, 0.01f, 1);} public void Train (Instances Instances, float step, int. Maxit, short algorithm) {float[][] datas = instances.datas;float[] Labels = instances.labels;int size = Datas.length;int Dim = datas[0].length;float[] W = new Float[dim];for (int i = 0; I &l T Dim i++) {W[i] = 1;} Switch (algorithm) {//batch gradient descent case 1:for (int i = 0; i < maxit; i++) {//For output float[] out = new Float[size];for (int s = 0; S & Lt Size s++) {Float lire = innerproduct (W, Datas[s]); Out[s] = sigmoid (lire);} for (int d = 0; d < Dim; d++) {Float sum = 0;for (int s = 0; s < size; s++) {sum + = (Labels[s]-out[s]) * datas[s][d ];} W[D] = w[d] + step * SUM;} System.out.println (Arrays.tostring (w));} break;//Random gradient descent case 2:for (int i = 0; i < maxit; i++) {for (int s = 0; s < size; s++) {Float lire = innerproduct (W, da Tas[s]); float out = Sigmoid (lire); float error = Labels[s]-out;for (int d = 0; d < Dim; d++) {w[d] + = Step * ERROR * DATAS[S][D];}} System.out.println (Arrays.tostring (w));} Break;}} Private float Innerproduct (float[] W, float[] x) {Float sum = 0;for (int i = 0; i < w.length; i++) {sum + = w[i] * x[i];} return sum;} Private float sigmoid (float src) {return (float) (1.0/(1 + math.exp (-SRC)));}}

Effect



The related problems of logistic regression and Java implementation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.