Introduction to Data Mining from entry level to advanced level

Source: Internet
Author: User
I have been doing data mining for some years. in this article, I wrote an article to give a friend a reference for data mining. on the other hand, it is also helpful, I hope that I can communicate with some of the experts and promote each other to make everyone laugh. Getting started: Books on data mining, which cover Chinese: JiaweiHan's data mining concepts and technologies

I have been doing data mining for some years. in this article, I wrote an article to give a friend a reference for data mining. on the other hand, it is also helpful, I hope that I can communicate with some of the experts and promote each other to make everyone laugh. Getting started: Books on data mining, which cover Chinese: JiaweiHan's data mining concepts and technologies

I have been doing data mining for some years. in this article, I wrote an article to give a friend a reference for data mining. on the other hand, it is also helpful, I hope that I can communicate with some of the experts and promote each other to make everyone laugh.

Entry:

Data mining entry-level books generally include:

Jiawei Han's data mining concepts and technologies

Ian H. Witten/Eibe Frank's "practical machine learning technology for data mining"

Tom Mitchell's machine learning

Toby segaran's collective smart programming

Anand Rajaraman's big data

Pang-Ning Tan's Introduction to Data Mining

Matthew A. Russell's data mining and analysis on social networking websites

The first data mining book of many people is Jiawei Han's "data mining concepts and technologies". this book is also an entry-level book recommended by our boss (I personally think he recommended it because Han is his teacher ). In fact, I personally do not recommend this book. This book talks about everything, and even a few of the points involved in many books, such as OLAP, are involved. But in fact, this book is not so friendly to beginners. it gives people the feeling of a textbook. if you have the perseverance to read this book, you can only get to understand some fragmented concepts, it is difficult to get started with the actual project.

I personally recommended these two books: TOBY SEGARAN's collective smart programming and Ian H. Witten/Eibe Frank's practical machine learning technology for data mining.

Collective intelligent programming is suitable for programmers who want to learn about data mining technology. This book describes many practical algorithms in data mining, in addition, the most important thing is that it is not about talking about the way of reading a book bag like Han, but starting with the actual example, supplemented by python code, this allows you to quickly understand the actual problems that the algorithm can be applied to, and write code on your own. The only drawback is that it is not in-depth enough, there is basically no mathematical derivation, and it is not comprehensive enough, and the content is not informative. However, as an entry-level book, these shortcomings help you understand and get started.

Another recommended book, "Practical machine learning technology for data mining", is a little harder than the previous book. however, to the extent that it is easy to understand, it is still a few steps away from teacher Han's book, the author is the author of the famous Weka. The thought context of this book is also as easy as possible to make it difficult to expand from simple models to practical algorithm problems in real life, the most valuable thing is that the book also gave a little talk about how to use weka, so that you can use weka to do small experiments and gain an intuitive understanding of algorithms.

After reading the above two books, I feel that I have a preliminary understanding of general data mining. How can I continue to get started later depends on my personal needs.

If you only want to know A little about related technologies or be A hobby, you can take A look at Anand Rajaraman's big data and Matthew. russell's "data mining and analysis of social networking sites". The former is based on the materials of Stanford's "Web mining" course. We have selected a lot of small points in data mining for expansion. it is not systematic enough, but it is quite good. so it is suitable for you to have a preliminary understanding and then look at it again. This is also true for the latter. it is a pity that many APIs cannot be directly tested due to GFS.

If you want to continue the research, I think you need to go through Tom Mitchell's machine learning first. This book can be seen as a summary of machine learning more than a decade ago. The author briefly and clearly describes many popular algorithms (10 years ago ), in addition, the application and features of each algorithm are described in detail, and a thin book gives everyone a machine learning journey.

Advanced:

This topic is hard to say. after all, we have different understandings of the advanced topic, which is a matter of benevolence. For me, it is recommended to expand as follows:

Video Learning:

You can watch a video of Stanford's machine learning course. Recently, I heard that NetEase has translated all of the open classes and has provided bilingual subtitles, making it easier to learn ^_^.

Books:

My personal recommendation is as follows: Let's take a look at Li Hang's statistical learning method. This book focuses on mathematical derivation and website space, allowing us to quickly gain a deeper understanding of some algorithms.

With the foundation of the above book, you can start to chew on some classic classics. The sequence of reading these famous books can be different or learned at the same time:

Richard O. Duda's "pattern classification" book is a reference. many colleges and universities use this book as an Introduction to Data Mining. it is also my introductory book on data mining ). If you cannot read this book, you will find that when you study many problems, there are even some relatively simple problems (for example, why does Bayesian degrade to a linear classifier under Gaussian hypothesis) you have to read this book again.

Christopher M. Bishop's "Pattern Recognition And Machine Learning" book is also a classic masterpiece, the entire book is very refreshing.

The Elements of Statistical Learning Book has a good remark that "machine Learning-from entry to mastery" can be used as The subtitle of this book. We can see the importance of this book for advanced machine learning. It is worth mentioning that although this book has a Chinese version, it is also famous for its poor translation. I heard that it is a translation of learning Sports.

Hoppner, Frank's Guide to Intelligent Data Analysis is not well-known as the classic book above, but it is well written and recommended on the knime official website, advertised is to solve the problem of data mining in real life, describes the CRISP-DM standardization process, each chapter is given after the R and knime application examples.

Previous reading notes

Project:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.