International - English

Cart Console

Topic Center

Contact Sales

Home > Others

A simple blog post about Weka

Last Update:2015-05-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Weka is a Java-written open source pattern recognition and data mining software, has been more than 10 years of history. Weka's official website is in http://www.cs.waikato.ac.nz/ml/weka/.

There are four problems with pattern recognition and data mining,

First: What is the problem?

Second: What is the data

Third: How to learn

Four: is the learning result reliable?

The first problem comes from demand. Analysis of requirements is difficult: rigorous logic, in-depth understanding of the industry macro and detail, familiar with the technical field and academic progress, there are several successful projects of practical experience, these four factors are indispensable, so usually by a team of different areas of the elite cooperation.

Weka does not solve the demand problem.

The second problem is data. Each sample corresponds to a Weka instance, and a dataset consisting of multiple samples corresponds to the Weka instances, which is the storage. For datasets, you need to select a variety of samples for training and testing, there are many options. For example, select only a subset of the samples for training and testing, handle the sample with missing attributes, select only some of the properties for training and testing, and rearrange the sample order to change the training and test results. If there is a supervised or unsupervised way to select the sample and its attributes.

A third problem is learning. If the first question can be precisely defined, then the answer to the third question must be clear. Weka provides a large number of algorithms, classifications, regressions, clustering, association rules, and so on. For beginners, the choice algorithm is a big problem, each algorithm has its advantages, but none of the algorithms is better than other algorithms on most indicators. The trick here is to do a lot of experiments and analyze the results, and do more naturally know what's good.

The fourth issue is to verify that the learner is reliable. The usual way is cross-validation, five times-fold crossover or 10 times-fold crossover. Then with the network style parameter. The general problem can be solved.

The big Data,weka recommendation is to use the command line to manipulate data and training, implement the algorithm yourself with groovy or jython if possible, or use an algorithm that can be incrementally learned. What it means to say is that Weka is not ready for big data, so it's best to use it to solve a single problem.

A simple blog post about Weka

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

OpenGL Series Tutorial Eight: OpenGL vertex buffer Object (VBO) 07-26

Methods for generating various waveform files Vcd,vpd,shm,fsdb 02-11

Mac Ping:sendto:Host is down Ping does not pass other people'... 09-01

Solution to the problem that WordPress cannot be opened after... 12-05

Webmaster resources (site creation required) 12-07

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

A simple blog post about Weka

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support