Weka Advanced Application--java Api__java

Source: Internet
Author: User
1. Introducing

This article is my notes on the study of data mining and machine learning –weka application technology and practice. The electronic version of the book Link is: http://download.csdn.net/detail/fhb292262794/8759397
The previous blog post summarizes the algorithm processing using Weka to demonstrate machine learning, mainly through Weka3.8 client software operations.
This article is handled through Java API calls, so that the machine learning algorithm of Weka can be used to process data in programming.

The example of this book is using weka3.7, I download the latest version of the weka3.8, update the code to adapt to the weka3.8 after the collation of records as follows. 1. Category (Teach you to write code) 1.1 Linear regression

Forecast Price

Room Rate Data:

@RELATION House

@ATTRIBUTE housesize NUMERIC
@ATTRIBUTE lotsize NUMERIC
@ATTRIBUTE bedrooms
@ ATTRIBUTE granite NUMERIC
@ATTRIBUTE bathroom NUMERIC
@ATTRIBUTE sellingprice NUMERIC

@DATA
3529,9191,6,0,0,205000 
3247,10061,5,1,1,224900 
4032,10150,5,0,1,197900 
2397,14156,4,1,0,189900 
2200,9600,4,0,1,195000 
3536,19994,6,1,1,325000 
2983,9365,5,0,1,230000 

Demand: Forecasts the price of new homes based on house information and prices in the vicinity of the area. The House information is: HOUSESIZE:3198;LOTSIZE:9669;BEDROOMS:5;GRANITE:3;BATHROOM:1; please forecast the price.

The logic to be addressed by the requirements is:
1. Load rate data.
2. Set property information.
3. Build the classifier and calculate the coefficients.
4. Use regression coefficient to predict unknown house price.

The code is as follows:

    public static final String Weka_path = "data/weka/";
    public static final String Weather_nominal_path = "Data/weka/weather.nominal.arff";
    public static final String Weather_numeric_path = "Data/weka/weather.numeric.arff";
    public static final String Segment_challenge_path = "Data/weka/segment-challenge.arff";
    public static final String Segment_test_path = "Data/weka/segment-test.arff";

    public static final String Ionosphere_path = "Data/weka/ionosphere.arff";
    public static void PLN (String str) {System.out.println (str); @Test public void Testlinearregression () throws Exception {instances DataSet = Converterutils.datasour
        Ce.read (Weka_path + "Houses.arff");
        Dataset.setclassindex (Dataset.numattributes ()-1);
        Linearregression linearregression = new Linearregression ();
        try {linearregression.buildclassifier (DataSet);
        catch (Exception e) {e.printstacktrace ();
      }  double[] Coef = linearregression.coefficients ();
                Double Myhousevalue = (coef[0] * 3198) + (coef[1] * 9669) + (coef[2] * 5) +

        (Coef[3] * 3) + (coef[4] * 1) + coef[6];
    System.out.println (Myhousevalue); }
1.2 Random Forest

Code:

@Test public
    void Testrandomforestclassifier () throws Exception {
        Arffloader loader = new Arffloader ();
        Loader.setfile (New File (Weka_path + "Segment-challenge.arff"));
        instances instances = Loader.getdataset ();
        Instances.setclassindex (Instances.numattributes ()-1);
        SYSTEM.OUT.PRINTLN (instances);
        System.out.println ("------------");

        Randomforest RF = new Randomforest ();
        Rf.buildclassifier (instances);
        SYSTEM.OUT.PRINTLN (RF);
    }
1.3 Meta classifier
The meta classifier
    @Test public
    void Testmetaclassifier () throws Exception {
        instances data = ConverterUtils.DataSource.read (Weather_numeric_path);
        if (data.classindex () = = 1)
            Data.setclassindex (Data.numattributes ()-1);

        Attributeselectedclassifier classifier = new Attributeselectedclassifier ();
        Cfssubseteval eval = new Cfssubseteval ();
        Greedystepwise stepwise = new Greedystepwise ();
        Stepwise.setsearchbackwards (true);
        J48 base = new J48 ();
        Classifier.setclassifier (base);
        Classifier.setevaluator (eval);
        Classifier.setsearch (stepwise);
        Evaluation Evaluation = new Evaluation (data);
        Evaluation.crossvalidatemodel (classifier, data, new Random (1234));
        PLN (Evaluation.tosummarystring ());
    
1.4 Forecast Classification Results (batch processing)

Code:

 /** * Using training set to predict the classification of test sets, batch processing/@Test public void Testoutputclassdistribution () throws Exception {
        Arffloader loader = new Arffloader ();
        Loader.setfile (New File (Segment_challenge_path));
        Instances train = Loader.getdataset ();

        Train.setclassindex (Train.numattributes ()-1);
        Arffloader loader1 = new Arffloader ();
        Loader1.setfile (New File (Segment_test_path));
        Instances test = Loader1.getdataset ();

        Test.setclassindex (Test.numattributes ()-1);
        J48 classifier = new J48 ();
        Classifier.buildclassifier (train);
        System.out.println ("Num\t-\tfact\t-\tpred\t-\terr\t-\tdistribution"); for (int i = 0; i < test.numinstances (); i++) {Double pred = classifier.classifyinstance (Test.instance (i))
            ;
            double[] dist = classifier.distributionforinstance (test.instance (i));
            StringBuilder sb = new StringBuilder (); Sb.append (i + 1). APPend ("-"). Append (Test.instance (i). ToString (Test.classindex ()). Append ("-")
            . Append (Test.classattribute (). Value ((int) pred)). Append ("-");
            if (pred!= test.instance (i). Classvalue ()) sb.append ("yes");
            else Sb.append ("no");
            Sb.append ("-");
            Sb.append (utils.arraytostring (Dist));
        System.out.println (Sb.tostring ()); }
    }

Here is the designation J48, is the decision tree classifier, can use other better classifier substitution, please compare the effect to choose classifier. 1.5 cross-validation

Code:

Cross-validation and prediction @Test public void Testoncecvandprediction () throws Exception {instances data = Converterutils.
        Datasource.read (Ionosphere_path);
        Data.setclassindex (Data.numattributes ()-1);
        Classifier classifier = new J48 ();
        int seed = 1234;

        int folds = 10;
        Debug.random Random = new Debug.random (seed);
        Instances NewData = new instances (data);
        Newdata.randomize (random);

        if (Newdata.classattribute (). Isnominal ()) newdata.stratify (folds);
        Performs cross validation and adds a predictive instances predicteddata = null;
        Evaluation eval = new Evaluation (NEWDATA);
            for (int i = 0; i < folds i++) {Instances train = Newdata.traincv (folds, i);
            Instances test = NEWDATA.TESTCV (folds, i);
            Classifier clscopy = abstractclassifier.makecopy (classifier);
            Clscopy.buildclassifier (train);

            Eval.evaluatemodel (clscopy, test);
   Add prediction         Addclassification filter = new Addclassification ();
            Filter.setclassifier (classifier);
            Filter.setoutputclassification (TRUE);
            Filter.setoutputdistribution (TRUE);
            Filter.setoutputerrorflag (TRUE);
            Filter.setinputformat (train);
            Filter.usefilter (train, filter);
            Instances pred = Filter.usefilter (test, Filter);
            if (Predicteddata = = null) Predicteddata = new instances (pred, 0);
        for (int j = 0; J < Pred.numinstances (); j + +) Predicteddata.add (Pred.instance (j)); PLN ("classifier:" + classifier.getclass (). GetName () + "" + utils.joinoptions ((optionhandler) classifier). Getop
        tions ()));
        PLN ("Data:" + data.relationname ());
        PLN ("seed:" + seed);
        PLN (eval.tosummarystring ("= = =" + folds + "test = = =", false)); Write Data ConverterUtils.DataSink.write (Weka_path + "Predictions.arff", predicteddata);
    } 
2. Clustering (hands-on teaching you to write code) 2.1 EM
@Test public
    void Testem () throws Exception {
        instances instances = ConverterUtils.DataSource.read (Weka_path + " Contact-lenses.arff ");
        EM cluster = new em ();
        Cluster.setoptions (New string[]{"I", "M"});
        Cluster.buildclusterer (instances);
        PLN (Cluster.tostring ());
    
2.2 Estimation of the clustering device
The way to evaluate the cluster 3 kinds @Test public void testevaluation () throws Exception {String FilePath = Weka_path + "Contac
        T-lenses.arff ";
        instances instances = ConverterUtils.DataSource.read (FilePath);
        1th string[] options = new string[]{"-T", filePath};
        String output = Clusterevaluation.evaluateclusterer (new EM (), options);

        PLN (output);
        The 2nd kind of densitybasedclusterer DBC = new EM ();
        Dbc.buildclusterer (instances);
        Clusterevaluation clusterevaluation = new Clusterevaluation ();
        Clusterevaluation.setclusterer (DBC);
        Clusterevaluation.evaluateclusterer (New instances (instances));

        PLN (Clusterevaluation.clusterresultstostring ());
        3rd///Density based clustering crossover verification densitybasedclusterer Newdbc = new EM (); Double Loglikelyhood = Clusterevaluation.crossvalidatemodel (NEWDBC, instances, 10,
        Instances.getrandomnumbergenerator (1234));
 PLN ("Loglikelyhood:" + Loglikelyhood);   } 
2.3 Clustering and evaluation
@Test public
    void Testclassestoclusters () throws Exception {
        String FilePath = Weka_path + "Contact-lenses.arff" ;
        instances data = ConverterUtils.DataSource.read (FilePath);
        Data.setclassindex (Data.numattributes ()-1);
        Remove remove = new remove ();
        Remove.setattributeindices ("" + (Data.classindex () + 1));
        Remove.setinputformat (data);
        Instances Datacluster = filter.usefilter (data, remove);

        Clusterer cluster = new EM ();
        Cluster.buildclusterer (datacluster);

        Clusterevaluation eval = new Clusterevaluation ();
        Eval.setclusterer (cluster);
        Eval.evaluateclusterer (data);

        PLN (Eval.clusterresultstostring ());
    
2.4 Output Clustering points
@Test public
    void Testoutputclusterdistribution () throws Exception {
        instances train = ConverterUtils.DataSource.read (Segment_challenge_path);
        Instances test = ConverterUtils.DataSource.read (Segment_test_path);
        if (!train.equalheaders (test))
            throw new Exception ("Train data and test data not the same.");

        EM clusterer = new Em ();
        Clusterer.buildclusterer (train);
        PLN ("Id-cluster-distribution");
        for (int i = 0; i < test.numinstances (); i++) {
            int cluster = clusterer.clusterinstance (Test.instance (i));
            double[] dists = clusterer.distributionforinstance (Test.instance (i));
            StringBuilder sb = new StringBuilder ();
            Sb.append (i + 1). Append ("-"). Append (Cluster). Append ("-"). Append (utils.arraytostring (dists));
            PLN (Sb.tostring ());
        }
    
3. Attribute selection (hands-on teaching you to write code) Automatic Property Selection

Application Cfssubseteval and Greedystepwise processing:

The underlying API attribute selection
    @Test public
    void Testuselowapi () throws Exception {
        Converterutils.datasource Source = new Converterutils.datasource (Weather_nominal_path);
        instances data = Source.getdataset ();
        if (data.classindex () = = 1)
            Data.setclassindex (Data.numattributes ()-1);
        Attributeselection attributeselection = new Attributeselection ();
        Cfssubseteval eval = new Cfssubseteval ();
        Greedystepwise search = new Greedystepwise ();
        Search.setsearchbackwards (true);
        Attributeselection.setevaluator (eval);
        Attributeselection.setsearch (search);
        Attributeselection.selectattributes (data);
        int[] Indices = Attributeselection.selectedattributes ();
        PLN (utils.arraytostring (indices));

    
4. Other 4.1 database table Operations
@Test public
    void Testsavecsv () throws Exception {
        Databaseloader loader = new Databaseloader ();
        Loader.seturl (Sqlutil.url);
        Loader.setuser (sqlutil.user);
        Loader.setpassword (Sqlutil.password);
        Loader.setquery ("Select question from question");
        Instances data1 = Loader.getdataset ();
        if (data1.classindex () = = 1)
            Data1.setclassindex (Data1.numattributes ()-1);
        System.out.println (data1);

        Csvsaver saver = new Csvsaver ();
        Saver.setinstances (data1);
        Saver.setfile (New File ("Data/weka/baidubook-csvsaver.csv"));
        Saver.writebatch ();

    }
4.2 Filters

Filtration

@Test public
    void Testfilter () throws Exception {
        instances instances = ConverterUtils.DataSource.read ("data/ Weka/houses.arff ");
        Instances.setclassindex (Instances.numattributes ()-1);
        SYSTEM.OUT.PRINTLN (instances);
        string[] options = new string[2];
        Options[0]  = "-R";
        OPTIONS[1] = "1";
        Remove remove = new remove ();
        Remove.setoptions (options);
        Remove.setinputformat (instances);
        Instances NewData = Filter.usefilter (instances,remove);
        System.out.println (NewData);
    }

Filter and classify

@Test public
    void Testfilteronthefly () throws Exception {
        instances instances = ConverterUtils.DataSource.read ("Data/weka/weather.nominal.arff");
        Instances.setclassindex (Instances.numattributes ()-1);
        SYSTEM.OUT.PRINTLN (instances);
        Remove remove = new remove ();
        Remove
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.