R Language Data Mining Combat series (1)

Last Update:2017-05-28 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

R Language Data Mining Combat (1)

First, the basis of data mining

Data Mining : "Gold panning" from the data, extracting hidden, unknown, potentially valuable relationships, patterns, and trends from a large amount of data, including text, and using these knowledge and rules to build models for decision support and to provide predictive decision support methods, tools, and processes.

Tasks for Data Mining

Using classification and prediction, cluster analysis, association rules, time series patterns, deviation detection, intelligent recommendations and other methods to help enterprises to extract data contained in the business value, improve the competitiveness of enterprises.

Data Mining Modeling Process

Define a mining target, that is, decide what you want to do?

Data sampling. Extracts a subset of the sample data associated with the mining target. Criteria for extracting data: first, relevance, reliability, and effectiveness. The criteria for measuring the quality of sampled data include: (1) Completeness of data, complete range of indicators, and (2) accurate data, reflecting the level of normal (not abnormal) state. Common sampling methods include: random sampling, equidistant sampling, stratified sampling, sampling from the starting sequence, and classifying samples.

Data exploration. The purpose of data exploration and preprocessing is to guarantee the quality of sample data, and thus lay the foundation for the quality of the model. Common data exploration methods are: Outlier analysis, missing value analysis, correlation analysis, periodic analysis and so on.

Data preprocessing. When the sampling data dimension is large, how to reduce the dimension, and to deal with the missing value is the problem that the data preprocessing should solve. The commonly used data preprocessing methods include: Data filtering, data variable conversion, missing value processing, bad data processing, standardization, principal component analysis, attribute selection, data specification, etc.

Mining modeling. This modeling is a data mining application which kind of problem (classification, Clustering, association rules, time series patterns or smart Recommendations), which algorithm to use to build the model?

Model evaluation. Automatically find the best model from these models to interpret and apply the model according to the business.

Common data Mining modeling tools

(1) R.

R is a language environment designed for statistical computation and graphical display, and is an implementation of the S language developed by Rick Becker, John Chambers and Allan Wilks of Bell Labs.

(2) Python.

Python is an easy-to-learn and powerful programming language with efficient advanced data structures and the ability to do object-oriented programming in a simple and efficient manner.

(3) SAS Enterprise Miner

Enterprise Miner (EM) is an integrated data mining system introduced by SAS, allowing the use and comparison of different technologies, while also integrating complex database management software.

(4) IBM SPSS Modeler

It encapsulates state-of-the-art statistics and data mining techniques to gain predictive knowledge and deploy the appropriate decision-making solutions to existing business systems and business processes. Intuitive operator interface, automated data preparation and proven predictive analytics models.

(5) SQL Server

The data Mining component--analysis Servers is integrated in Microsoft SQL Server. In SQL Server 2008, we provide decision tree algorithm, clustering algorithm, Naive Bayes Algorithm, association rule algorithm, time Series algorithm, neural network algorithm, linear regression algorithm, etc. 9 commonly used data mining algorithms. But platform portability is relatively poor.

(6) MATLAB

MATLAB is the United States MathWorks Company developed the application software, with a strong scientific and engineering computing capacity, it not only has a matrix calculation based on the powerful mathematical computing ability and analytical ability, but also has a wealth of visual graphics performance functions and convenient program design capabilities.

(7) WEKA

WEKA (Waikato Environment for knowledge analysis) is a high-profile, open-source machine learning and data mining software.

(8) TIPDM

TIPDM (the top data mining platform) is developed using the Java language to obtain data from a variety of data sources and to build multiple data mining models. At present, dozens of kinds of predictive algorithms and analysis techniques have been integrated, which basically covers the algorithms supported by the main mining systems at home and abroad.

This article is from the "Rangers" blog, please be sure to keep this source http://ccnupxz.blog.51cto.com/8803964/1930452

R Language Data Mining Combat series (1)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

R Language Data Mining Combat series (1)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

R Language Data Mining Combat series (1)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support