Accurate data mining in the big Data era-using R language

Source: Internet
Author: User

Teacher Profile:

Gino, who is about to step into middle age, has acquired a bachelor's degree in mathematics and applied mathematics and a master of statistics from a prestigious university, has been studying and working abroad for nearly 20 years, and has been conducting the theory and practice of data analysis, with a strong knowledge of mathematics, statistics and computer skills.

In one of the world's top 500 companies in the core department responsible for data modeling and analysis work, in practice to conquer statistical modeling and data analysis of countless problems, data processing and analysis of scientific precision, in practical applications to achieve good results.

Gino, a data analysis trainer for many years, explored a set of examples to explain the principles of statistical understanding and software operations familiar methods, teaching students can quickly understand statistical principles and use statistical software independent analysis of data.

Gino Teacher has used a variety of computer language or software, the final choice of R as a tool for statistical analysis, admiration of its powerful statistical metering function, daily love, in the use of R has a wealth of practical experience.

He has been invited to teach the R language (basic and advanced) on many occasions at the NPC Economic Forum, and combines theory and practice well to help students master the principles and practices of software, statistics and metrology.

Course Description:

This course combines the basic knowledge of r language and data mining with a good combination of key and difficult points, focusing on practical use, according to the learning and working experience of the instructor, and in order to improve the level of the students to grasp the relevant content.

The course covers the essence of r language and data mining, from the outline, whether it is popular data mining methods, or cutting-edge algorithms are involved, the course provides a large number of cases, these cases, the Internet, marketing, financial insurance and other fields of data mining examples, Undoubtedly the study and the work has the great reference and the guidance significance.

Course Features:

1. This course avoids the mathematical formula as far as possible, according to the "clear thinking principle-combined with specific case-R language Implementation Details" thinking, so that even if there is hardly any foundation of the students, Master R language Data mining basic ideas and patterns, lay a good foundation for the future in depth, Be able to work and study in combination with specific problems immediately get started to solve the operation;

2. The curriculum attaches importance to the method of combining learning and practice, will adopt the method of group discussion, fully mobilize the enthusiasm of everyone to think, grasp the relevant knowledge and skills in doing;

3. The curriculum tightly grasps the R language and the data mining key and the difficulty, the detailed analysis and the explanation, in the understanding difficult, is prone to the error place to remind repeatedly, in order to be easy for the student to carry on the review and the related development after the class.

Target group:

For those who are interested in using R language for data mining, especially those who want to get the basic and thought of r language and data mining, they want to use r language in practice to solve problems in school students and in-service workers.


First Lecture: The Essentials of R language

In order to gradually and cover the R language important and useful basic content principles, this talk from the beginning of the introduction of the R language, with the previous data processing as the core, with the actual case as the carrier, including the R language vector, data frame, matrix operation, missing value and 0 value of the processing, especially focus on the R language constructor programming to solve practical problems Detailed description of the powerful data cleansing Plyr, zoo, car and other commonly used packages and powerful mapping Ggplot2 package, for the use of R language data mining to lay a solid tool foundation.

Main cases:

Case 1: How to merge, sort, analyze the data and compile the Shannon-Woerner index with the R language Plyr and other packages;

Case 2: How to use R language programming simultaneously achieve dozens of difficult data analysis visual image of JPEG format output;

Case 3: How to use the R language for layering or cluster sampling to build training sets and test sets;

Case 4: Use Ggplot2 to draw a variety of complex graphics.

Second Lecture: Logistic regression and commercial big Data modeling

Logistic regression is one of the most important data mining methods in commercial modeling, this talk about the modeling principle of logistic regression, the difference with the multivariate linear model, the realization process of r language and the considerations of regression diagnosis, the prediction method and the interpretation of the results. Let students master the logistic regression to solve the problem of the R language method.

Main cases:

Case 1: Using logistic regression to help commercial banks to complete the forecast of customers ' advance repayment;

Case 2: Using logistic regression to help doctors choose the best treatment plan for patients;

Case 3: Using logistic regression to help manufacturers analyze the important factors of customer making purchase decision;

Case 4: Using logistic regression to help life insurance company to target customer accurate telemarketing;

Case 5: Using logistic regression to help commercial banks to complete credit scoring for customers;

Case 6: Use logistic regression to help companies analyze the causes of customer churn and make predictions.

The Third Lecture: Association rules and R language implementations

Association rules (the famous "Beer and diaper") is one of the basic and core technologies of data mining, this lecture will focus on the classical Apriori algorithm and Eclat algorithm, explain the support, confidence and promotion degree and control of association rules, and use R language to complete association rule analysis quickly.

Main cases:

Case 1: Using the R Language Association rules approach to help supermarkets achieve the best bundling of goods (i.e. "shopping basket" analysis);

Case 2: Analysis of association Rules for the Survival of Titanic passengers;

Case 3: Increase the relevance of personal income analysis.

Chapter Four: Decision tree (regression tree) analysis and R language implementation

Decision tree is the classical method of data mining, its principle is easy to be understood. This lecture focuses on two of the most common decision tree algorithms: Cart and C4.5 algorithm, R language analysis using Rpart and J48 functions.

Main cases:

Case 1: The decision tree Analysis of automobile fuel consumption and the prediction of related target variables are completed.

Case 2: Use decision trees to help the telecom to judge and predict the customer's broadband business.

Bagging and AdaBoost algorithm for machine integration learning

These two methods summarize the prediction results of many classifiers to achieve a significant improvement in the classification effect. This paper introduces the idea of these 2 algorithms, constructs the training set and the test set in the R language further analysis.

Main cases:

Case 1: The bagging and AdaBoost of R language are used to analyze and forecast the periodical deposits of commercial banks;

Case 2: Identify poisonous mushrooms with bagging and adaboost in R language.

The R language random forest (randomforest) algorithm

In machine learning, a random forest is a classifier that contains multiple decision trees, and the principle of random forest method is discussed so as to help students to judge the situation of suitable random forest analysis and finally to master the method of random forest analysis of R language.

Main cases:

Case 1: Classification and prediction of random forests for skin diseases;

Case 2: Classification and evaluation of the quality and type of wine.

Seventh: implementation of support Vector machine and R language

This paper will analyze the structure risk minimization principle, interval and kernel function of support vector machine, so as to help learners to understand the idea and algorithm of support vector machine, and the problem of attention in use, so as to help learners to apply flexibly in various fields.

Main cases:

Case 1: Analysis of the famous Iris iris data by support vector machine;

Case 2: Identification of poisonous mushrooms using support vector machines;

Case 3: Stock index prediction using support vector machines.

Eighth: Neural network and the implementation of R language

Neural network is composed of a large number of nodes and output functions to form a logical strategy, this talk about its principle, mainly through the case of the way to explain the R language implementation of neural network algorithm process and attention to matters.

Main cases:

Case 1: Analysis and prediction of the quality and type of alcohol in the neural network;

Case 2: Corporate financial early warning model.

Nineth Lecture : Cross-validation compares each model

For the same data, there may be many models to fit, how to measure and compare the accuracy of the model? This presentation will introduce cross-validation training sets and test sets to help you select the best model to fit and predict in practice.

Tenth: Using R language combined with KNN algorithm for text mining

Text mining, especially for Chinese text mining is becoming increasingly important. This lecture introduces the principles and methods of text mining, and helps us to use R language to find valuable information in a large amount of unstructured data, extract potentially useful data, find suitable patterns, and realize visualization results.

Main cases:

Case: Text mining (including Word segmentation, classification, visualization, etc.) using R language combined with KNN algorithm for Web pages (web)

Accurate data mining in the big Data era-using R language

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.