Hello, WEKA

Last Update:2018-12-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

From http://dreamhead.blogbus.com/logs/16813833.html

WEKA is a data mining software written in Java. Data mining, literally, is a process of searching for useful information from data. However, it involves a lot of content, so here we use the "classification" side for details.
Classification. From the name point of view, it's no longer easy. It gives you something and divides it into classes. How do you know how to classify it? Obviously, this is based on your existing experience. Where does this experience come from for computers? Only when people tell it, that is to say, we need to train computers with a batch of data. The trained computers have certain recognition capabilities and can complete some simple classification work. In reality, there are many opportunities to use classification. For example, one of my previous projects used this method to identify vehicles.
The following describes how to use WEKA to complete a classification program.
Import WEKA. classifiers. classifier;
Import WEKA. classifiers. BAYes. naivebayesmultinomial;
Import WEKA. Core. Attribute;
Import WEKA. Core. fastvector;
Import WEKA. Core. instance;
Import WEKA. Core. instances;
Import WEKA. Filters. filter;
Import WEKA. Filters. unsupervised. Attribute. stringtowordvector;
Public class main {
Private Static final string good = "G ";
Private Static final string bad = "B ";
Private Static final string Category = "category ";
Private Static final string text = "text ";
Private Static final int init_capacity = 100;
Private Static final string [] [] training_data = {
{"Good", good },
{"Wonderful", good },
{"Cool", good },
{"Bad", bad },
{"Disaster", bad },
{"Terrible", bad}
};
Private Static final string test_data = "good ";
Private Static filter = new stringtowordvector ();
Private Static classifier = new naivebayesmultinomial ();
Public static void main (string [] ARGs) throws exception {
Fastvector categories = new fastvector ();
Categories. addelement (good );
Categories. addelement (bad );
Fastvector attributes = new fastvector ();
Attributes. addelement (new attribute (text, (fastvector) null ));
Attributes. addelement (new attribute (category, categories ));
Instances instances = new instances ("WEKA", attributes, init_capacity );
Instances. setclassindex (instances. numattributes ()-1 );
For (string [] pair: training_data ){
String text = pair [0];
String Category = pair [1];
Instance = createinstancebytext (instances, text );
Instance. setclassvalue (category );
Instances. Add (instance );
}
Filter. setinputformat (instances );
Instances filteredinstances = filter. usefilter (instances, filter );
Classifier. buildclassifier (filteredinstances );
// Test
String testtext = test_data;
Instance testinstance = createtestinstance (instances. stringfreestructure (), testtext );
Double predicted = classifier. classifyinstance (testinstance );
String Category = instances. classattribute (). Value (INT) predicted );
System. Out. println (category );
}
Private Static instance createinstancebytext (instances data, string text ){
Attribute textatt = data. Attribute (text );
Int Index = textatt. addstringvalue (text );
Instance = new instance (2 );
Instance. setvalue (textatt, index );
Instance. setdataset (data );
Return instance;
}
Private Static instance createtestinstance (instances data, string text) throws exception {
Instance testinstance = createinstancebytext (data, text );
Filter. Input (testinstance );
Return filter. Output ();
}
}
This program is divided into two parts. The first half is used to train the classifier, and the second half is used to test the classifier.
To train a classifier, We need to select a classification algorithm and prepare training data. In WEKA, each classification algorithm is a subclass of classifier, so that the classification algorithm can be easily modified without changing other parts.
In fact, people who have a little understanding of this knowledge will know that classification algorithms are important, but what really determines the skill size of a classifier is the data used for training. To get a good classifier, you must constantly adjust the training data and continuously train the classifier. This problem is the same as that of human cognition. It is more widely known to have better resolution capabilities.

In WEKA, the data used for training is instances. As the name suggests, this is the plural number of instances. Obviously, a separate training data is instance, and the existence of instances class, some common attributes of the instance can be put together. Here, we can see that in order to use text as training data, we will convert the text to instance. Similarly, when we test the classifier, we also convert the text into an instance and then classify it.
In addition, there is also a filter concept, similar to the common filter concept, which gives us an opportunity to process data before formal processing. Here, we mainly perform some changes to the instance.
After we get a classifier, we can use this classifier for classification. The most critical code is
Classifier. classifyinstance (testinstance );
This code returns a similarity calculated based on the classification algorithm. We can use this value to estimate the category of the data we are testing.
The code itself is not complicated. As mentioned above, a good classifier requires data help. Therefore, if you change the test data, you will find that the Classifier Implemented in this code is not powerful at all. If you want it to be powerful, expanding training data is an inevitable result. However, this blog is not important because we only need to ask WEKA about it. Further efforts are needed.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hello, WEKA

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hello, WEKA

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support