Free software related to data mining

Source: Internet
Author: User

Reprinted from Http://reader.dashuai.net/?p=100

Data Cleansing Class tool
Datawrangler

Google Refine

Statistical analysis class Tools

The R Project for statistical Computing

Timeflow

Data Presentation class Tools

Google Fusion Tables

Impure

Tableau Public

Many Eyes

VIDI

Zoho Reports

Code Helper Class Tool

Choosel

Exhibit

Map-related data display tools

Quantum GIS (QGIS)

Openheatmap

Openlayers

Text class related processing tools

IBM Word-cloud Generator

Social Network class tools

Gephi

NodeXL

What is the use of data mining? What are the links between data mining and data warehousing? What are the links between data mining and market research, and data analysis? ......

A literacy article

The results show that the data processed by enterprises multiply exponentially every 5 years, resulting in excessive duplication and inconsistency of enterprise data, and how to obtain favorable information in the data to promote the development of data mining technology.

1. Related concepts of data mining

"Data mining is the process of finding information that is hidden in the data (such as trends, features, and correlations)," says Professor Shebangchang in "Data mining clementine Applications", which is to dig information or knowledge from data KDD (Knowledge discovery in databases).

Data mining can be said to converge on the following six areas:

A database system-data warehousing and online analysis (OLAP)

B Machine Learning

C Statistics and data analysis method

D Visualization

E Mathematical Programming

F High Performance Computing

What are the links between data mining and data warehousing? My personal understanding is that data warehousing is a precondition for data mining, because data in the data warehouse is usually collated data, which is what we usually call clear data, and the process of identifying interesting or valuable information from these useful data is data mining.

2. Application fields of data mining

Data mining is an important strategic plan for every company, so it's highly confidential, so it's not easy to investigate what companies are doing with data mining. Here are some of the most common areas of data mining:

A customer profile management, often companies want to identify some common characteristics of customers, hoping to predict who may become their customers, to help marketers find the right marketing target to reduce marketing costs, improve the success rate

b Shopping basket analysis, often used to help retail practitioners understand the customer's consumer behavior, such as which customers will buy together, which customers will buy a product after a period of time will buy another product, using data mining, retail practitioners can more effectively determine the amount of goods or inventory, how the product emissions and so on

C Customer Relationship Management, the company can often analyze, originally some of its own customers, but later turned into a competitor of the customer, analysis of the characteristics of this group, and then based on these characteristics to the existing customer data to identify the potential to turn to the customer, Then design a solution to retain this segment of the customer base (after all, the cost of finding a new customer is much higher than the cost of retaining an existing customer)

In addition, data mining is also used in many other industries, such as the financial industry, telecommunications, retail, internet and so on, its common situation summarized as follows:

Application of data Mining

Customer-centric

Take action as the center

Research as the center

Lifetime value

Shopping basket Analysis

Archive segmentation

Keep

Target market

Acquisition

Knowledge Portal

Cross-Sell

Event Management

Electronic commerce

Profitability Analysis

Pricing

Fraud detection

Risk assessment

Portfolio Management

Employee turnover

Cash Management

Production efficiency

Network performance

Manufacturing process

Combinatorial chemistry

Genetic studies

Epidemiological studies

3. Steps of data mining and common analysis methods

Each person's data mining process is different, but it is certain that they spend most of their time preparing for the data phase, while the other steps are just such an approximate process:

1) Understanding of data and work done

2) access to relevant knowledge and technology

3) Integration and inspection data

4) Removal of erroneous and inconsistent data

5) Development models and assumptions

6) Actual data mining work

7) Testing and validating the data mined

8) Interpretation and use of data

Data mining analysis method is to use the data to establish some models to imitate the real world, using these models to describe the patterns and relationships in the data, commonly used data mining analysis methods are:

1) for classification \ Clustering analysis methods, such as: Factor analysis, discriminant analysis, clustering analysis, in addition to decision trees (commonly used classification methods are cart<classification and regression trees> and chaid< Chi-Square automatic interaction detector> two kinds)

2) Calculation of predictive analysis methods such as regression, time series, neural networks, etc.

3) Sequence rule analysis methods, such as association rules, sequence rules, etc.

4, the main data mining software

Currently on the market more commonly used data mining software is not less than 30 kinds (of course, are developed by foreigners, so far have not found the Chinese developed such software), such as Mlc++,clementine,darwin,intelligent Miner,sas data Mining, S-plus,matlab and so on. Here are a few simple examples:

1) SPSS CLEMENTINE,SPSS Company release, this tool combines a variety of graphical user interface analysis techniques, including neural networks, association rules and rule generation techniques.

2) Oracle Darwin, Oracle Corporation, has the advantage of supporting multiple algorithms that can be executed on a variety of master-slave architectures, with a single-processor, synchronous multiprocessor, or a large number of parallel processors, positioned in a wide range of execution

3) SAS Enterprise Miner,sas, a leader in the data mining market, is applicable to the development of data mining and the decision support application of the whole CRM.

4) IBM Intelligent MINER,IBM is the largest and most powerful tool on the market, and its overall performance is the best in customer assessment reports and is positioned as a pioneer in enterprise data mining solutions.

Second, data mining and market analysis

"Statistical analysis to give you the opportunity to analyze the report to you see, data mining to you with insight" such a sentence to describe the relationship between market analysis and data mining is more accurate. However, data mining is only used to help business analysis planners to find a variety of possible assumptions from the data, whether these assumptions are correct, whether the value is still to be determined, in order to get a more definite answer, the enterprise has to spend some time and experience to verify these assumptions, leading to enterprise research purposes, According to the relevant hypothesis design questionnaire, based on the survey results, the use of statistical analysis to produce analysis reports, so as to launch a series of new development programs, the development of new customers, back again and continue to data mining, forming a data mining-market research-statistical analysis of the virtuous circle.

In addition, statistical analysis has improved many new analytical methods for data mining, such as the Probabilistic Analysis Network (PLN) in the application of neural network technology, the Bayesian network in mining methods, the probabilistic evolutionary algorithm in genetic algorithms (PMEA) and so on.

Third, to engage in data mining work need to grasp the relevant knowledge

1, database Technology data mining is in a large number of data to find their own interest or useful information process, which involves database operations related functions, then master a database above knowledge becomes inevitable, this is why many enterprises in the domestic data mining practitioners are the cause of computer professional.

2, the relevant industry knowledge is the previous data mining steps mentioned related knowledge and technology, no industry background knowledge, pure technology made out of the analysis report as if "no root" of the water

3, master more than one data mining software in fact, many databases also provide the corresponding analysis functions, such as the aforementioned IBM, Oracle data mining software

4. Knowledge of relevant statistics and market analysis without such knowledge, reports may make more or less errors, thus causing serious deviations in the results of the analysis.

Free software related to data mining

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.