The previous article introduced the ARFF format, which is a proprietary WEKA format. Generally, We need to extract or obtain data from other data sources. WEKA supports conversion from CVS or from databases. The interface is shown in figure
The WEKA installation directo
of each category
Find the most frequent category
Create a rule and assign this category to this attribute value
Calculate the rule Error Rate
Select the rule with the smallest error rate
In a simple example, the data uses the weather dataset that comes with WEKA.
For each attribute, there are five. The last one is the result we want to output, so there are only four attribute values. Outlo
Several basic concepts and two basic algorithms for association rules are described in the previous few. But actually in the commercial application, the writing algorithm is less than, understands the data, grasps the data, uses the tool to be important, the preceding basic article is to the algorithm understanding, this article will introduce the open source utilizes the
house has been inserted.Listing 3. housing prices using regression models
sellingPrice = (-26.6882 * 3198) + (7.0551 * 9669) + (43166.0767 * 5) + (42292.0901 * 1) - 21661.1208sellingPrice = 219,328
However, looking back at the beginning of this article, we know that data mining is not just about outputting a value: it is about recognition patt
Brief introduction
What is data mining? You will ask yourself this question from time to again, because this topic is getting more and more attention from the technical circles. You may have heard that companies like Google and Yahoo! are generating billions of of data points about all their users, and you wonder, "What do they want all this information for?" "Y
Weka looks like this when it's open. 4. Allow Weka to recognize kanji Locate the installation address for Weka (the above step is installed in "D:\Weka-3-6") and locate the file Runweka.ini Open this file, the "fileencoding=cp1252" with the # comment out, enter "Fileencoding=utf-8" in the following can be Now create
I personally think we can directly discuss data mining.AlgorithmAnd WEKA are too impatient to use. I learned data mining methods directly from the beginning. Some methods are difficult and boring. What I often think about is not the method itself, but "What is this ?".
After WEKA
Brief introduction
In the two articles before the "Data mining with WEKA" series, I introduced the concept of data mining. If you haven't read data mining with
Brief introduction
In data mining with WEKA, part 1th: Introduction and regression, I introduced the concept of data mining and free open source software Waikato Environment for Knowledge Analysis (WEKA), which can be used to min
to allocate the cluster affiliation of each object. At the same time, the update of the center of the cluster and the update of the cluster distribution, until convergence.
Here is the code that calls the Kmeans implemented in the Weka package
package others;
Import Java.io.File;
Import Weka.clusterers.SimpleKMeans;
Import weka.core.DistanceFunction;
Import weka.core.Instances;
Import Weka.core.converters.ArffLoader;
public class Arraylisttest
very important. It allows you to develop and expand new mining algorithms. In this regard, WEKA (idmer: Almost representative of open-source data mining software) provides a comprehensive documentation of Java functions and class libraries, which is very suitable for expansion. Of course, you must first fully understa
Http://hi.baidu.com/stockfans/blog/item/489c4b1010584304213f2e98.html
File structure
An important basis for identifying an ARFF file is a branch. Therefore, you cannot arbitrarily break the line in this file. Empty rows (or rows with all spaces) are ignored.
The Rows starting with "%" are comments and WEKA ignores these rows. If the "weather. ARFF" file you see has more or fewer lines starting with "%", it will not be affected.
After the an
An hour to understand data mining ⑤ data mining steps and common clustering, decision tree, and CRISP-DM conceptsNext Series 4:An hour to understand data mining ①: Resolving common Big Data
The previous article introduced the open source data mining software Weka to do Association rules mining, Weka convenient and practical, but can not handle large data sets, because the memory is not fit, give it more time is usele
The idea of self-taught machine learning is really because of my interest in data mining, because in my heart I have always believed in the logic that there is a certain pattern behind everything, and that different situations only correspond to certain conditions. So to find such a pattern is the most convenient and quickest way to solve a class of problems, as a lazy person like me, of course, I would lik
Data preprocessing includes processing of missing data values, standardization, standardization, and discretization.Processing of missing data values: WEKA. Filters. unsupervised. Attribute. replacemissingvalues.For the value attribute, use the average value instead of the missing value. For the nominal attribute, use
It should be difficult to use WEKA for a m training set:
1. Increase the memory size. In fact, WEKA can not only use physical memory, but also occupy virtual memory. If the available memory of Java is set to 2 GB, if the physical memory of the machine is only 1 GB, the operating system will automatically divide a block on the hard disk as the virtual memory as needed. However, this process is generally slo
Frequent patterns mining (frequent pattern Mining) is a kind of mining commonly used in data mining, which is a frequent pattern mining algorithm called Apriori. First look at what is called frequent mode. ~ is the pattern that of
[Introduction to Data Mining]-Introduction to data types and Data MiningData TypeDifferent datasets are manifested in many aspects. For example, attributes describing data objects can have different types: quantitative or qualitative. In addition, a dataset may also have a s
strength is statistical analysis, which provides a wide range of parametric and parametric testing methods. At the same time, there are many feature selection methods.
WEKA
WEKA (Waikato environment for knowledge analysis, http://www.cs.waikato.ac.nz/ml/weka/) may be the most famous open source machine learning and data
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.