Reprinted from Http://reader.dashuai.net/?p=100
Data Cleansing Class tool
Datawrangler
Google Refine
Statistical analysis class Tools
The R Project for statistical Computing
Timeflow
Data Presentation class Tools
Google Fusion Tables
Impure
Tableau Public
Many Eyes
VIDI
Zoho Reports
Code Helper Class Tool
Choosel
Exhibit
Map-related data display tools
Quantum GIS (QGIS)
Openheatmap
Openlayers
Text class related processing tools
IBM Word-cloud Generator
Social Network class tools
Gephi
NodeXL
What is the use of data mining? What are the links between data mining and data warehousing? What are the links between data mining and market research, and data analysis? ......
A literacy article
The results show that the data processed by enterprises multiply exponentially every 5 years, resulting in excessive duplication and inconsistency of enterprise data, and how to obtain favorable information in the data to promote the development of data mining technology.
1. Related concepts of data mining
"Data mining is the process of finding information that is hidden in the data (such as trends, features, and correlations)," says Professor Shebangchang in "Data mining clementine Applications", which is to dig information or knowledge from data KDD (Knowledge discovery in databases).
Data mining can be said to converge on the following six areas:
A database system-data warehousing and online analysis (OLAP)
B Machine Learning
C Statistics and data analysis method
D Visualization
E Mathematical Programming
F High Performance Computing
What are the links between data mining and data warehousing? My personal understanding is that data warehousing is a precondition for data mining, because data in the data warehouse is usually collated data, which is what we usually call clear data, and the process of identifying interesting or valuable information from these useful data is data mining.
2. Application fields of data mining
Data mining is an important strategic plan for every company, so it's highly confidential, so it's not easy to investigate what companies are doing with data mining. Here are some of the most common areas of data mining:
A customer profile management, often companies want to identify some common characteristics of customers, hoping to predict who may become their customers, to help marketers find the right marketing target to reduce marketing costs, improve the success rate
b Shopping basket analysis, often used to help retail practitioners understand the customer's consumer behavior, such as which customers will buy together, which customers will buy a product after a period of time will buy another product, using data mining, retail practitioners can more effectively determine the amount of goods or inventory, how the product emissions and so on
C Customer Relationship Management, the company can often analyze, originally some of its own customers, but later turned into a competitor of the customer, analysis of the characteristics of this group, and then based on these characteristics to the existing customer data to identify the potential to turn to the customer, Then design a solution to retain this segment of the customer base (after all, the cost of finding a new customer is much higher than the cost of retaining an existing customer)
In addition, data mining is also used in many other industries, such as the financial industry, telecommunications, retail, internet and so on, its common situation summarized as follows:
Application of data Mining
Customer-centric
Take action as the center
Research as the center
Lifetime value
Shopping basket Analysis
Archive segmentation
Keep
Target market
Acquisition
Knowledge Portal
Cross-Sell
Event Management
Electronic commerce
Profitability Analysis
Pricing
Fraud detection
Risk assessment
Portfolio Management
Employee turnover
Cash Management
Production efficiency
Network performance
Manufacturing process
Combinatorial chemistry
Genetic studies
Epidemiological studies
3. Steps of data mining and common analysis methods
Each person's data mining process is different, but it is certain that they spend most of their time preparing for the data phase, while the other steps are just such an approximate process:
1) Understanding of data and work done
2) access to relevant knowledge and technology
3) Integration and inspection data
4) Removal of erroneous and inconsistent data
5) Development models and assumptions
6) Actual data mining work
7) Testing and validating the data mined
8) Interpretation and use of data
Data mining analysis method is to use the data to establish some models to imitate the real world, using these models to describe the patterns and relationships in the data, commonly used data mining analysis methods are:
1) for classification \ Clustering analysis methods, such as: Factor analysis, discriminant analysis, clustering analysis, in addition to decision trees (commonly used classification methods are cart<classification and regression trees> and chaid< Chi-Square automatic interaction detector> two kinds)
2) Calculation of predictive analysis methods such as regression, time series, neural networks, etc.
3) Sequence rule analysis methods, such as association rules, sequence rules, etc.
4, the main data mining software
Currently on the market more commonly used data mining software is not less than 30 kinds (of course, are developed by foreigners, so far have not found the Chinese developed such software), such as Mlc++,clementine,darwin,intelligent Miner,sas data Mining, S-plus,matlab and so on. Here are a few simple examples:
1) SPSS CLEMENTINE,SPSS Company release, this tool combines a variety of graphical user interface analysis techniques, including neural networks, association rules and rule generation techniques.
2) Oracle Darwin, Oracle Corporation, has the advantage of supporting multiple algorithms that can be executed on a variety of master-slave architectures, with a single-processor, synchronous multiprocessor, or a large number of parallel processors, positioned in a wide range of execution
3) SAS Enterprise Miner,sas, a leader in the data mining market, is applicable to the development of data mining and the decision support application of the whole CRM.
4) IBM Intelligent MINER,IBM is the largest and most powerful tool on the market, and its overall performance is the best in customer assessment reports and is positioned as a pioneer in enterprise data mining solutions.
Second, data mining and market analysis
"Statistical analysis to give you the opportunity to analyze the report to you see, data mining to you with insight" such a sentence to describe the relationship between market analysis and data mining is more accurate. However, data mining is only used to help business analysis planners to find a variety of possible assumptions from the data, whether these assumptions are correct, whether the value is still to be determined, in order to get a more definite answer, the enterprise has to spend some time and experience to verify these assumptions, leading to enterprise research purposes, According to the relevant hypothesis design questionnaire, based on the survey results, the use of statistical analysis to produce analysis reports, so as to launch a series of new development programs, the development of new customers, back again and continue to data mining, forming a data mining-market research-statistical analysis of the virtuous circle.
In addition, statistical analysis has improved many new analytical methods for data mining, such as the Probabilistic Analysis Network (PLN) in the application of neural network technology, the Bayesian network in mining methods, the probabilistic evolutionary algorithm in genetic algorithms (PMEA) and so on.
Third, to engage in data mining work need to grasp the relevant knowledge
1, database Technology data mining is in a large number of data to find their own interest or useful information process, which involves database operations related functions, then master a database above knowledge becomes inevitable, this is why many enterprises in the domestic data mining practitioners are the cause of computer professional.
2, the relevant industry knowledge is the previous data mining steps mentioned related knowledge and technology, no industry background knowledge, pure technology made out of the analysis report as if "no root" of the water
3, master more than one data mining software in fact, many databases also provide the corresponding analysis functions, such as the aforementioned IBM, Oracle data mining software
4. Knowledge of relevant statistics and market analysis without such knowledge, reports may make more or less errors, thus causing serious deviations in the results of the analysis.
Free software related to data mining