WEKA and its data format

Source: Internet
Author: User

Http://hi.baidu.com/stockfans/blog/item/489c4b1010584304213f2e98.html

 

File structure

An important basis for identifying an ARFF file is a branch. Therefore, you cannot arbitrarily break the line in this file. Empty rows (or rows with all spaces) are ignored.

The Rows starting with "%" are comments and WEKA ignores these rows. If the "weather. ARFF" file you see has more or fewer lines starting with "%", it will not be affected.

After the annotation is removed, the entire ARFF file can be divided into two parts. The first part provides the header information, including the declaration of the link and the declaration of attributes. The second part provides the data information, that is, the data provided in the data set. Starting from the "@ data" mark, data information is followed.

Header information Data Information Convert other formats to ARFF

Although WEKA supports other formats of files, the ARFF format is the best. Therefore, it is necessary to convert the dataset format to ARFF before data processing.

CSV
    • The quickest way to convert CSV to ARFF is to use the command line tool in WEKA.
Run WEKA master ProgramIn the menu, find the "simple CLI" module, which provides the command line function. At the bottom of the new window (top cannot be written): Enter the text in the input box

 

Java WEKA. Core. converters. csvloader filename.csv> filename. ARFF

 

To complete the conversion.
    • WEKA 3.5 provides an "ARFF viewer" module, which can be used to open a CSV file for browsing and save it as an ARFF file.
    • Go to the "exploer" module, open the CSV file from the buttons above, and save it as an ARFF file.

 

C4.5

It is similar to a CSV file.

Xls

The Excel xls file allows multiple two-dimensional tables to be placed in different worksheets. We can only save each worksheet into a different CSV file. Open an xls file, switch to the worksheet to be converted, and save it as CSV. Click "OK" and "yes" to ignore the prompt to complete the operation.

Next, convert the CSV file to ARFF according to the preceding steps.

Mat

A two-dimensional table in MATLAB is a matrix. We use this command to save a matrix into CSV format.

Csvwrite ('filename', matrixname)

It should be noted that the CSV file provided by matllab often has no attribute name (or Excel may not ). WEKA must read the attribute name from the first row of the CSV file. Otherwise, the attribute values of the first row will be read as the variable name. Therefore, we need to open the CSV file provided by matllab with text editing software and manually add a row of attribute names. Note that the number of attribute names must be the same as the number of data attributes, which are still separated by commas.

From http://wiki.wekacn.org/index.php/ARFF"

From: http://wiki.wekacn.org/index.php/ARFF#.E6.96.87.E4.BB.B6.E7.BB.93.E6.9E.84

Reprinted: http://www.blog.edu.cn/user2/huangbo929/archives/2007/1789774.shtml

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.