Experiment on Data Mining classification (II)

Source: Internet
Author: User

(2) experiment process

 

A. Environment Construction

Select Indian liver patient dataset (ilpd) as the dataset in this experiment. With the help of weka3.6.9, the programming environment is eclipse + jdk7.

1. dataset acquisition

Select the Indian liver patient dataset (ilpd) dataset to go To the download page to download the dataset (SEE)

2. Install WEKA

Download the wekainstallation package weka-3-6-9-x64.exe and run and install it. (For example)

 

Installation interface:

 

Installation completed:

 

WEKA working interface:

 

B. Experiment steps

 

1. Build a development platform

Open eclipse, click File> New> project ..., Create a Java project, as shown in:

Create a new Java project dataminingtest and configure build path to import the required WEKA. jar and other required jar packages to the project, as shown in:

Note: The jar package dependency error may occur when WEKA algorithm is called. You need to add some additional jar packages.

 

2. Import Data

Import the ARFF file during data preparation and print the data, as shown in:

3. data preprocessing

In this experiment, we use WEKA filter as a data preprocessing tool. The general process of this tool is: instantiate the filter-> pass in the filter parameter-> use the filter. usefilter.

Because the dataset of the decision tree j48 algorithm used in this experiment needs to be discretization, discretization filtering is used. Shows the implementation result:

Shows the dataset changes after filtering:

 

Note: Since the classical type of the last column is numeric, nominal type conversion is required on WEKA platform. The conversion method also uses filter, as shown in the following figure:

 

 

4. Select an algorithm and create a model.

To obtain the optimal model, we need to configure the decision parameters. Here we use the WEKA platform to obtain the optimal model by modifying the number of tree instances/leaf nodes. The experiment steps are as follows:

 

1. Open WEKA software platform, click to enter explorer, select Open File ..., Open the train_data.arff file, as shown in:

 

2. Select choose in the filter area, select numerictonominal and discretize under attribute under the unsupervised node, and click "Apply" to complete data preprocessing. The result is shown in:

 

3. Click to enter the classify page, select the j48 algorithm under trees in the classifier area, select cross-validation in test options, and fill in 10 in the folds box, as shown below:

4. Use the default configuration and click Start. The result is as follows:

 

5. The result shows that the classification accuracy rate is 62.4% under the default configuration. To obtain the optimal model, we modify the node and instance number of the tree for testing. The test results are as follows:

 

Instance/node count

2

3

5

6

7

8

9

10

20

30

Accuracy

64.2%

64.6%

63.2%

64%

64.6%

63%

65.8%

66.2%

66.4%

65.8%

 

The table shows that when the number of instances is 20, the correct rate is the highest. Therefore, we use this configuration as the standard generation model. The model generation method is as follows: In the result list area, right-click the record whose instance is 20, and the right-click menu appears. Select save model to generate the model. The procedure is as follows:

 

 

(3) model evaluation

To evaluate the correctness of the model, we use the test set for verification.

 

The verification result is as follows:

From the prediction results, there are only four values in the 83 data that are correctly predicted. The actual test result accuracy is 95.2%.

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.