It's been years since I last ventured to answer "How to choose Data Mining Tools". This article mainly elaborates the following two core viewpoints:
1. There is no best tool, or rather, the best tool for everyone.
2. The most useful tools are those that can meet the vast majority of data mining tasks you need.
The main data mining tasks
Most data-mining people understand that 70% to 90% of data mining projects are prepared for data. In the process of data Mining tool evolution, the development of data preparation function has been placed in secondary position. Finally, you need to be able to accurately evaluate the model to compare multiple models and recommend them to marketers.
Data Preparation Task
Common data preparation tasks include:
Conduct data evaluation
To identify:
Missing value (empty string, space, null value)
Isolated point
Collinearity (autocorrelation between variables)
merging multiple data sets;
A mapping of metadata (name and type of field) from different input formats to common parsing formats;
Transform the value of a similar variable into a generic format;
Some algorithms have special requirements for input variables, they need to transform the numeric variables into category type (through Data box and classification), or transform the category type into numerical type.
Cut the value of a variable into multiple fields, or combine multiple fields into one field;
Derive a new variable from an existing variable. Most data-mining people find that some of the most predictive variables are the ones that derive from them.
Most data mining tools place these data mining functions in a secondary position, and this article focuses on evaluating the ability of common data mining tools to handle these tasks.
In addition to supporting the data preparation tasks above, a good data mining tool should also include the ability to evaluate the model to compare the multiple models generated during the modeling process and to support direct marketing (Marketing direct).