Java implements a linear regression algorithm.

Source: Internet
Author: User

On the Internet, I think it is very useful to see the implementation of a one-dimensional linear regression written in Java. Some enterprises are not using data mining. Is it a function to predict operating income? A linear regression algorithm is used to calculate similar functions. Go directly to the Code:

1. Define a datapoint class to encapsulate coordinate points X and Y:

/*** File: datapoint. java * Author: zhouyujie * Date: 16:00:00 * Description: Java implements the unary linear regression algorithm, coordinate point entity class, (which can predict statistical indicators) */package COM. zyujie. DM; public class datapoint {/** the X value */public float X;/** the Y value */public float y;/*** constructor. ** @ Param x * The X value * @ Param y * The Y value */Public datapoint (float X, float y) {This. X = x; this. y = y ;}}

2. The following is the regression line of Algorithm Implementation:

/*** File: datapoint. java * Author: zhouyujie * Date: 16:00:00 * Description: Java implements the unary linear regression algorithm, regression implementation class, (statistical indicator prediction can be achieved) */package COM. zyujie. DM; import Java. math. bigdecimal; import Java. util. arraylist; public class regressionline // implements evaluatable {/** sum of x */private double sumx;/** sum of y */private double Sumy; /** sum of x */private double sumxx;/** sum of x * y */private doubl E sumxy;/** sum of y */private double sumyy;/** sum of Yi-y */private double sumdeltay; /** sum of sumdeltay ^ 2 */private double sumdel%2;/** Error */private double SSE; private double SST; private double E; private string [] XY; private arraylist listx; private arraylist listy; private int xmin, xmax, ymin, Ymax;/** line coefficient A0 */private float A0; /** line coefficient A1 */private float A1;/** num BER of data points */private int PN;/** true if coefficients valid */private Boolean coefsvalid;/*** constructor. */Public regressionline () {xmax = 0; Ymax = 0; Pn = 0; xy = new string [2]; listx = new arraylist (); listy = new arraylist ();}/*** constructor. ** @ Param data * the array of data points */Public regressionline (datapoint data []) {Pn = 0; xy = new string [2]; listx = new arraylist (); listy = ne W arraylist (); For (INT I = 0; I <data. length; ++ I) {adddatapoint (data [I]) ;}/ *** return the current number of data points. ** @ return the count */Public int getdatapointcount () {return PN ;} /*** return the coefficient A0. ** @ return the value of A0 */public float geta0 () {validatecoefficients (); Return A0 ;} /*** return the coefficient A1. ** @ return the value of A1 */public float geta1 () {vali Datecoefficients (); Return A1;}/*** return the sum of the X values. ** @ return the sum */Public double getsumx () {return sumx;}/** return the sum of the y values. ** @ return the sum */Public double getsumy () {return Sumy;}/** return the sum of the X * x values. ** @ return the sum */Public double getsumxx () {return sumxx;}/** return the sum of the X * y values. ** @ return the sum */Public Dou Ble getsumxy () {return sumxy;} public double getsumyy () {return sumyy;} public int getxmin () {return xmin;} public int getxmax () {return xmax ;} public int getymin () {return ymin;} public int getymax () {return Ymax;}/*** Add a new data point: update the sums. ** @ Param datapoint * The new data point */Public void adddatapoint (datapoint) {sumx + = datapoint. x; Sumy + = datapoint. y; sumxx + = datap Oint. x * datapoint. x; sumxy + = datapoint. x * datapoint. y; sumyy + = datapoint. y * datapoint. y; If (datapoint. x> xmax) {xmax = (INT) datapoint. x;} If (datapoint. y> Ymax) {Ymax = (INT) datapoint. y;} // Save the coordinates of each vertex to the arraylist. Use XY [0] = (INT) datapoint. X + ""; XY [1] = (INT) datapoint. Y + ""; if (datapoint. x! = 0 & datapoint. y! = 0) {system. out. print (XY [0] + ","); system. out. println (XY [1]); try {// system. out. println ("N:" + n); listx. add (Pn, XY [0]); listy. add (Pn, XY [1]);} catch (exception e) {e. printstacktrace ();}/** system. out. println ("N:" + n); system. out. println ("arraylist * listx:" + listx. get (n); system. out. println ("arraylist listy:" + * listy. get (n); */} + + PN; coefsvalid = false;}/*** return the value of the regression line function at X. (implementation of * evaluatable .) ** @ Param x * the value of X * @ return the value of the function at x */public float (int x) {If (PN <2) return float. nan; validatecoefficients (); Return A0 + A1 * X;}/*** reset. */Public void reset () {Pn = 0; sumx = Sumy = sumxx = sumxy = 0; coefsvalid = false;}/*** validate the coefficients. calculate the equation coefficient y = a */private void validatecoefficients () {If (coefsvalid) return; If (PN> = 2) {float xbar = (float) in ax + B) sumx/PN; float ybar = (float) Sumy/PN; a1 = (float) (PN * sumxy-sumx * Sumy) /(PN * sumxx-sumx * sumx); a0 = (float) (ybar-A1 * xbar);} else {a0 = A1 = float. nan;} coefsvalid = true;}/*** Return Error */Public double getr () {// traverse this list and calculate the denominator for (INT I = 0; I <PN-1; I ++) {float YI = (float) integer. parseint (listy. get (I ). tostring (); float y = at (integer. parseint (listx. get (I ). tostring (); float deltay = Yi-y; float deltay 2 = deltay * deltay;/** system. out. println ("Yi:" + yi); system. out. println ("Y:" + Y); * system. out. println ("deltay:" + deltay); * system. out. println ("deltoff2:" + del%2); */sumdel%2 + = del%2; // system. out. println ("sumdel00002:" + sumdel00002);} SST = sumyy-(Sumy * Sumy)/PN; // system. out. println ("SST:" + SST); E = 1-sumdelspon2/SST; return round (E, 4 );} // implement precise rounding public double round (Double V, int scale) {If (scale <0) {Throw new illegalargumentexception ("the scale must be a positive integer or zero");} bigdecimal B = new bigdecimal (double. tostring (v); bigdecimal one = new bigdecimal ("1"); return B. divide (one, scale, bigdecimal. round_half_up ). doublevalue ();} public float round (float V, int scale) {If (scale <0) {Throw new illegalargumentexception ("the scale must be a positive integer or zero");} bigdecimal B = new bigdecimal (double. tostring (v); bigdecimal one = new bigdecimal ("1"); return B. divide (one, scale, bigdecimal. round_half_up ). floatvalue ();}}

3. Linear regression testing:

/*** File: datapoint. java * Author: zhouyujie * Date: 16:00:00 * Description: Java implements the unary linear regression algorithm, linear regression testing class, (which can predict statistical indicators) */package COM. zyujie. DM;/*** <p> * <B> linear regression </B> <br> * demonstrate linear regression by constructing the regression line for a set * of data points. ** <p> * require datapoint. java, regressionline. java ** <p> * to calculate the Minimum Variance return line for a given data point, sumx, Sumy, sumxx, and sumxy must be calculated; (Note: sumxx = sum (x ^ 2) * <p> * <B> regression linear equation: F (x) = a1x + a0 </B> * <p> * <B> the formula for calculating the slope and intercept is as follows: </B> <br> * n: number of data points * <p> * a1 = (N (sumxy)-sumx * Sumy)/(n * sumxx-(sumx) ^ 2) <br> * a0 = (Sumy-Sumy * A1)/n <br> * (can also be expressed as a0 = averageY-a1 * averagex) ** <p> * <B> principle of draw a line: the two points are in a straight line. Only two points can be determined. </B> <br> * First point: (0, a0) then, take a X1 value into the equation and obtain Y1, Link (0, a0) and (x1, Y1. * To let the line pass through the entire graph, X1 can take the maximum x max of the X coordinate, that is, the two points are (0, a0), (xmax, Y ). If y = A1 * xmax + a0 and Y are greater than * y, Ymax is not used. Use y to obtain the maximum Ymax value, and calculate the value of X at this time. Use (x, Ymax), that is, two points are (0, a0), (X, Ymax) ** <p> * <B> fitting degree calculation: (R ^ 2 in Excel) </B> * <p> ** r2 = 1-E * <p> * Calculation of error E: E = SSE/SST * <p> * SSE = sum (Yi-y) ^ 2) SST = sumyy-(Sumy * Sumy)/N; * <p> */public class linearregression {Private Static final int max_points = 10; private double E;/*** main program. ** @ Param ARGs * the array of runtime arguments */public static void main (string ARGs []) {regressionline line = new regressionline (); line. adddatapoint (New datapoint (1,136); line. adddatapoint (New datapoint (2,143); line. adddatapoint (New datapoint (3,132); line. adddatapoint (New datapoint (4,142); line. adddatapoint (New datapoint (5,147); printsums (line); printline (line);}/*** print the computed sums. ** @ Param line * the regression line */Private Static void printsums (regressionline) {system. out. println ("\ n number of data points n =" + line. getdatapointcount (); system. out. println ("\ nsum x =" + line. getsumx (); system. out. println ("sum y =" + line. getsumy (); system. out. println ("sum xx =" + line. getsumxx (); system. out. println ("sum xy =" + line. getsumxy (); system. out. println ("sum YY =" + line. getsumyy ();}/*** print the regression line function. ** @ Param line * the regression line */Private Static void printline (regressionline) {system. out. println ("\ n regression formula: Y =" + line. geta1 () + "x +" + line. geta0 (); system. out. println ("error: R ^ 2 =" + line. getr ());} // y = 2.1x + 133.7 2.1*6 + 133.7 = 12.6 + 133.7 = 146.3 // y = 2.1x + 133.7 2.1*7 + 133.7 = 14.7 + 133.7 = 148.4}

Run the test class to obtain the running result:

1,136
2,143
3,132
4,142
5,147

Number of data points n = 5

Sum x = 15.0
Sum y= 700.0
Sum xx= 55.0
Sum xy = 2121.0
Sum YY = 98142.0

Regression formula: Y = 2.1x + 133.7
Error: R ^ 2 = 0.3658

Assume that a company:

RMB January in 1.36 million
RMB February in 1.43 million
RMB March in 1.32 million
RMB April in 1.42 million
RMB May in 1.47 million

Based on the regression line formula: Y = 2.1x + 133.7, we can predict the income in March:

Y = 2.1*6 + 133.7 = 12.6 + 133.7 = 146.3

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.