An Introduction to the Data Science series at the University of johnkins
In the past few months, I have taken Andrew Ng from Stanford University as a reference for his machine learning handout, on the CSDN blog, I wrote some summary notes related to machine learning and data mining (separate component analysis and reinforcement learning are not completed, I have a new understanding of statistics and data mining I have learned before (many formulas are best deduced by myself, so that I can truly deepen my understanding and have a deep memory ). However, when learning, I also feel that I have two major shortcomings: 1. I am not very theoretical. For example, a few days ago, after summing up the concepts related to Bayesian machine learning, we found that we did not thoroughly understand the essential ideas of the Bayesian school, but only knew a Bayesian formula. 2. Lack of practical application experience. In the face of practical problems, designing specific solutions will be entangled in the selection of various methods. Data Mining is not only a theoretical science, but also an empirical science.
A few days ago, Coursera launched a series of special courses, one of which is the Data Science series offered by the Bloomberg School of Public Health at the prestigious free open course website. Marvel at the fact that foreigners share the best education in the world, I don't even know why many experts and professors in China refuse to share their lectures with the students (I believe everyone has heard the lectures and asked for rejection of the slides ).
Since we have such a good course, we hope to improve our abilities and make up for the shortcomings through studying the course. I will refer to the official website of the course below, briefly introduce the purpose and content of this course:
1. What will you learn?
(1) Formulate issues and assumptions related to the research background to drive data science research;
(2) identify, obtain and convert data to form statistical evidence so that it is suitable for written communication;
(3) create a model based on the new data type, experiment design, and statistical inference.
Ii. Course Content
This series of courses uses the R language as a tool and is divided into nine parts:
(1) The Data Scientists Toolbox)
(2) R Programming (R Programming)
(3) Getting and Cleaning Data)
(4) Exploratory Data Analysis)
(5) Reproducible Research)
(6) Statistical Inference (Statistical Inference)
(7) Regression Model (Regression Models)
(8) Practical Machine Learning)
(9) Developing Data Products)
Iii. General requirements
It is the data science capability provided by the course handouts. It can be seen that, to become a data, computer skills, mathematical statistics knowledge, and professional skills are required at the same time.
At last, I attached a road map to become a data scientist. It does not belong to the handout of this course. However, it provided the specific knowledge and skills that data scientists need to master. From this figure, we can see that it is a long journey to become a data scientist.
Since we hope to make progress, we need to stick to it. I will stick to learning and keep a record based on this course.