First fling soup words: Do not know how, from pure engineering studies to management jobs, and entered the study of data mining. Everything is learned from the beginning, will not write code, (the science of C + +, also only remember the name of the course), mathematics has not moved for many years, discrete mathematics has never touched, this road is struggling.
Nothing to say, there is time to complain, as quickly as to open dry, with one months not to, the first clear where they are not enough (in fact, where are not enough), the things you want to complement the first code in this, something a little more, change body! Spell it!
P.S: Usually go to work, work is also very busy. According to this plan, has been carried out for 2-3 months, has been a little feeling. In fact, this two or three months did not fall down the heart to learn, estimated that they can get started in six months.
Not a lot of nonsense, first put your own list of studies to be listed. I 0 basis, so some things very basic, Master Mo to the written test.
1. Basic knowledge of data mining
This part is mainly reading, first understand a situation. As for what book to use. The good word of mouth is "data mining: Concept and Technology (3rd edition)". I bought it, looked at the more than half, feel not suitable for beginners. Some concepts directly give, for the poor foundation of the people somewhat abrupt, if the bite-bullet hard to see, or very rewarding, suitable for the introduction after repeated look. Recommend a very old book "Data Warehouse and data mining ", Takemori and so on. This book is relatively not so thick, many basic concepts are discussed, very friendly to beginners.
This part of the study is through all the time, experienced predecessors introduced, when to take it out to see the harvest.
2. Fundamentals of Mathematics
This part is also indispensable, learning may not be able to feel. Learn, the absolute income is endless. My plan is interspersed throughout the learning process. The main content is: linear algebra, Discrete mathematics.
(1) Linear algebra
Have learned, have not learned to study carefully. The domestic textbooks personally think that the concept has not been thoroughly said. such as eigenvalues and eigenvectors, what exactly is the use of. The multiplication of matrices is of essential significance and is not clearly stated.
Recommended public classes at MIT: linear algebra. NetEase Open Class there is translation good. Attached link:
Http://open.163.com/special/opencourse/daishu.html
(2) Discrete mathematics
The majority of people (not professional) have not learned, listening to the headache. Don't worry, don't learn all about it, the focus is on graph theory, algebraic systems, propositions (predicates and logic), sets and relationships. Find a thin textbook for yourself. These content actually before high school undergraduate all has the contact, mainly is some logical symbol, the thinking way needs to understand. Otherwise in some places to see some inexplicable symbols, do not understand, see some simple formula thought is very complex, not worth the candle.
(3) Operations Research
This is definitely a basic course, the reason is put in the back because I seriously learned. The recommended textbook "Operations Research" textbook writing group. A big thick green book. The game theory and so on are not to be seen. Conditionally you can run the algorithm over and over again. Absolutely rewarding.
3. Tools
This Part I searched on the internet for a long time, the subject group asked hundreds of times. The final confirmation of these several. Many people say that have the programming experience of the person, learn one on a two week thing, helpless, I 0 Foundation. So, this part is definitely a priority. First say I determine the language: MATLAB, Python, R.
(1) MATLAB
First speak of Matlab, don't say this old, don't say this is the school to engage in academic use. Don't want to stir up arguments, the main reason--good to get started. After getting started, you can run some algorithms to improve some confidence and learning pleasure. I'm looking for a thick copy of the textbook (never turned over). My main view is the official manual of the Primer. Then began to write scripts and functions, if you do not understand the direct Baidu, Google or help. The writing is very clear. This part of the main is quick to get started, I have a little harvest.
(2) Python and R
The two were put together because there was too much debate online about the two. I have been lost countless times. It is true that there are advantages in not arguing the pros and cons. My order is to first learn python, determined to use this as my main program. Next is R, starting with the drawing. R's drawing is really nice. As for the idea of learning: first find an introductory book, the easier the better, after learning to find a manual, and then practice.
First Python, see Head first Python. Pretty good. Easy to understand, the Internet can still download to the English PDF. Then "data analysis using Python" and "machine learning combat". The first book mainly uses Python to do data mining, the basic mention of Python learning will recommend this. The second is a masterpiece of understanding machine learning, the language used in the book is Python. Learn the language while understanding machine learning. In a good order.
Second R, because with the first part of the foundation, it will be easier to learn. The main recommended textbooks are "R Language Beginner's Guide" and "R language Combat". This part of the study I am ready to jump to see, early mainly use R to draw. Then step into the study. This is enough to practice, not just on paper.
(3) Mysql
Finally add one, understand a little MySQL, because the zero basis on the various kinds of data are not understood, highly recommended a week to read the "in-depth MySQL." Not very difficult, mainly to get started. If you need it later, study it further.
Again: This stage is still to find a program to write. If you have a job or project, go straight on and learn the quickest. If not, find a good, interesting doctoral thesis, run through the inside of the program. This part of the content is not learned, absolutely practiced out.
4. Algorithms
There are too many algorithms, the common ones. On the one hand to understand, understand the algorithm. On the other hand, run out in the language above. Can understand the algorithm, but also a good familiarity with the language.
5. Text Editor
Directly on the dry goods. Emacs Org-mode. This part is something to be learned in order to enter a higher stage. It's not because it's hard, it's not bad. Mainly because this is not my current priority. It's really good to look at someone else's use, the psychological itch. So, let's put it here.
Summarize
Throughout the introductory phase, never hold on to learning one's thoughts. Learn at the same time! For example, the introduction of language, the search algorithm practice. See an algorithm, be sure to run out of programs. The middle is tired, the mathematics foundation complements a complement. Look at the algorithm when, where mathematics do not understand, Baidu where.
Summary sentence: repeated practice. Half-yearly introductory.
(There is a need, welcome to exchange, what information is needed, I can share the words.) )
0 Basic Data Mining Learning checklist