Experiment on Data Mining classification (II)

Last Update:2018-12-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

(2) experiment process

A. Environment Construction

Select Indian liver patient dataset (ilpd) as the dataset in this experiment. With the help of weka3.6.9, the programming environment is eclipse + jdk7.

1. dataset acquisition

Select the Indian liver patient dataset (ilpd) dataset to go To the download page to download the dataset (SEE)

2. Install WEKA

Download the wekainstallation package weka-3-6-9-x64.exe and run and install it. (For example)

Installation interface:

Installation completed:

WEKA working interface:

B. Experiment steps

1. Build a development platform

Open eclipse, click File> New> project ..., Create a Java project, as shown in:

Create a new Java project dataminingtest and configure build path to import the required WEKA. jar and other required jar packages to the project, as shown in:

Note: The jar package dependency error may occur when WEKA algorithm is called. You need to add some additional jar packages.

2. Import Data

Import the ARFF file during data preparation and print the data, as shown in:

3. data preprocessing

In this experiment, we use WEKA filter as a data preprocessing tool. The general process of this tool is: instantiate the filter-> pass in the filter parameter-> use the filter. usefilter.

Because the dataset of the decision tree j48 algorithm used in this experiment needs to be discretization, discretization filtering is used. Shows the implementation result:

Shows the dataset changes after filtering:

Note: Since the classical type of the last column is numeric, nominal type conversion is required on WEKA platform. The conversion method also uses filter, as shown in the following figure:

4. Select an algorithm and create a model.

To obtain the optimal model, we need to configure the decision parameters. Here we use the WEKA platform to obtain the optimal model by modifying the number of tree instances/leaf nodes. The experiment steps are as follows:

1. Open WEKA software platform, click to enter explorer, select Open File ..., Open the train_data.arff file, as shown in:

2. Select choose in the filter area, select numerictonominal and discretize under attribute under the unsupervised node, and click "Apply" to complete data preprocessing. The result is shown in:

3. Click to enter the classify page, select the j48 algorithm under trees in the classifier area, select cross-validation in test options, and fill in 10 in the folds box, as shown below:

4. Use the default configuration and click Start. The result is as follows:

5. The result shows that the classification accuracy rate is 62.4% under the default configuration. To obtain the optimal model, we modify the node and instance number of the tree for testing. The test results are as follows:

Instance/node count	2	3	5	6	7	8	9	10	20	30
Accuracy	64.2%	64.6%	63.2%	64%	64.6%	63%	65.8%	66.2%	66.4%	65.8%

The table shows that when the number of instances is 20, the correct rate is the highest. Therefore, we use this configuration as the standard generation model. The model generation method is as follows: In the result list area, right-click the record whose instance is 20, and the right-click menu appears. Select save model to generate the model. The procedure is as follows:

(3) model evaluation

To evaluate the correctness of the model, we use the test set for verification.

The verification result is as follows:

From the prediction results, there are only four values in the 83 data that are correctly predicted. The actual test result accuracy is 95.2%.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Experiment on Data Mining classification (II)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Experiment on Data Mining classification (II)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support