Java integration Weka To do linear regression examples __java

Source: Internet
Author: User

After studying the logical regression of the classification, continue to make a linear regression look. Linear regression in the field of data mining should also be very common, that is, based on the existing data set (matrix of row vectors), (training) to simulate a suitable law (function) to speculate on any new data combination (vector) should be the value of the.

Specific description can see a variety of blogs, how to deduce it seems to see a little, but in summary The result is also simple, is to calculate a "proper" multivariate linear function y=a0+a1*x1+a2*x2+a3*x3+...+ak*xk. I'm not here to Ctrl + V. Here's just a look at how the code is integrated.

There is a corresponding linear regression linearregression in Weka. The same is true in the sense that the model is constructed first and then used, when it is used to construct a instance, and then the Classifyinstance function is used to get the predicted value.

Training model

    Static Abstractclassifier Trainmodel (String arfffile, int classindex) throws Exception {

        file Inputfile = new File (ARF Ffile); Training file
        Arffloader loader = new Arffloader ();
        Loader.setfile (inputfile);
        Instances Instrain = Loader.getdataset (); Read into the training document
        Instrain.setclassindex (classindex);

        Linearregression linear = new Linearregression ();
        Linear.buildclassifier (Instrain)//To construct the classifier return linear by training data

        ;
    
1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 The 12 13

I used a data sample from the IBM Technology blog (predicting house prices according to the size of the house, the number of rooms, the number of bathrooms):

@RELATION House

@ATTRIBUTE housesize NUMERIC
@ATTRIBUTE lotsize NUMERIC
@ATTRIBUTE bedrooms
@ ATTRIBUTE granite NUMERIC
@ATTRIBUTE bathroom NUMERIC
@ATTRIBUTE sellingprice NUMERIC

@DATA
3529,9191,6,0,0,205000 
3247,10061,5,1,1,224900 
4032,10150,5,0,1,197900 
2397,14156,4,1,0,189900 
2200,9600,4,0,1,195000 
3536,19994,6,1,1,325000 
2983,9365,5,0,1,230000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

But one thing that puzzles me is how to construct a data instance that can be used as a model to predict the parameters (the independent variable vector). Because the Classifyinstance function is to receive a data instance (that is, the independent variable vector) Weka.core.Instance, and the search sees that people are constructed like this:

Instance ins = new Weka.core.Instance (numoffields);
1 1

But in fact this is a fundamental compilation but, found the reason: Weka.core.Instance is a interface. (is the earlier version of instance a class that can be instantiated?) )
If you look in the Weka API documentation, you can still see the implementation class.

So the code can be written:

Using a model to predict

    public static void Main (string[] args) throws Exception {
        final String arfftraindata = "Data/house.arff";

        Abstractclassifier classifier = Trainmodel (Arfftraindata, 5);

        Instance ins = new Weka.core.SparseInstance (5);
        Ins.setvalue (0, 990.8);
        Ins.setvalue (1, 1080.8);
        Ins.setvalue (2, 3);
        Ins.setvalue (3, 0);
        Ins.setvalue (4, 1);

        Double price = classifier.classifyinstance (ins);
        System.out.println ("Price:" + price);
    }
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Run it:

price:131311.66927984258
1 1

Interesting to see how this model is, you can print out the model to solve the various coefficients.

Linearregression linear = new Linearregression ();
......
for (double coef:linear.coefficients ()) {
    System.out.println (COEF);
}
1 2 3 4 5 1 2 3 4 5

Run a look and you can get:

-26.688240074108368
7.055124244983151
43166.07667227803
0.0
42292.09008972738
0.0
- 21661.120845270096
1 2 3 4 5 6 7 1 2 3 4 5 6-7

There are two coefficients is 0, corresponding Arff file can be learned that granite is not affected by the results. The second 0 coefficient is not the position of the price, because the value is the predicted variable, so the coefficient must also be 0.
So the interpretation of this model is:

Sellingprice =

    -26.6882 * housesize +
      7.0551 * lotsize +
  43166.0767 * bedrooms +
  42292.0901 * Bathroom +
  -21661.1208

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.