? The elements of statistical learning )? Class notes (1)

Source: Internet
Author: User

I posted my microblog two days ago, but Wu lide from Fudan Institute of Computer Science is running? The elements of statistical learning )? This course is still in Zhangjiang... how can I miss Daniel's class? I'm sure I have to ask for leave to join the class... in order to reduce the psychological pressure, I also pulled a bunch of colleagues to listen to it. A dozen of eBay's mighty people killed the past! We always feel that we have more than Fudan students, and the classrooms of 50 or 60 people are full and spectacular.

This book has been around for a while before, so it can be heard. It is indeed a good book about data mining models. The author's website provides free electronic downloads! Http://www-stat.stanford.edu /~ Tibs/elemstatlearn/

From this week onwards, I will update my class notes every week. On the other hand, we will also add some of our own understandings and insights in actual work. In addition, if you are interested in data mining, you can also go to Coursera to attend lectures ~ It seems that this semester's machine learning evaluation is good. I only selected a model thinking from Coursera, which is relatively simple but elegant! If you have time, you will write about the course experience again. I will try to use all Chinese characters for the notes, but I just try...

------------ Class notes --------

The first class is mainly an introduction to the interests in this field and the subsequent course arrangements. The first chapter of this book.

1. What is statistical learning? Learn knowledge from data. Simply put, we have an outcome result to be predicted, which is recorded as Y, which may be discrete or continuous. In addition, some observed features (feature) are recorded as X, which may be one-dimensional or multidimensional. For every observed individual, we will get a row vector. (X1,..., XP) , Corresponding to the observations of P features, and an observed result value Y . If there are n individuals in total, then we will get these values for each individual. (Y1,..., yn) T Is the column vector of the observed results and the x (n * P) matrix. Such data is called a training set ). Notation is more agreed here.

2. What is the statistical learning classification? Generally, we have an Observed Result Y, and then find a suitable model to predict y based on X. This is called supervised learning ). In some cases, y cannot be observed, so learning through X is called unsupervised learning ). This book focuses on supervised learning.

3. Regression and classifier. This is mainly related to y. If y is discrete, for example, different colors in red, yellow, and blue, it is called a classifier (learning model). If y is continuous, for example, height, it is called a regression (learning model ). The difference here is more about the title.

4. What are statistics learning tasks? Prediction. Through what to predict? Learning models ). What should I learn? Certain criteria are required, such as the minimum mean square error (MSE), applicable to the 0-1 criterion of classifier. An Optimized implementation method based on these criteria is called an algorithm.

5. Statistical Learning example?

Classifier: determines whether the email is spam based on the mail sender, content, and title;

Regression: Relationship between prostate specific antigen (PSA) level and other factors such as cancer;

Image Recognition: Recognition of handwritten letters;

Clustering: Determine the similarity of samples based on the DNA sequence, such as paternity testing.

6. Course Arrangement order?

The second chapter is an overview of supervised learning models.

Chapter 3 and Chapter 4 discuss linear regression models and linear classifiers.

Chapter 5 discusses the generalized linear model (GLM ).

Chapter 6 involves the kernel method and local regression.

Chapter 7 describes model evaluation and selection.

Chapter 8 focuses on algorithms, such as maximum likelihood estimation and bootstrap. This term is expected to be covered here. So I will not list them later.

The second part of the visual test will become more and more difficult. Some time ago, I learned the second chapter and learned it for a long time. For reading notes at that time, see some insights on the dimensionality reduction model.

-------- 10.15 supplement ---------

Last week, I wrote my notes with my memories. Today, I flipped through the class notes that I wrote at that time. I 'd like to add some more.

Chapter 9 is an addition model, that is F (x1,..., XP) = f (X1) +... + f (XP)

Chapter 10 is the boosting Model

Chapter 2 Discussion on Neural Networks

Chapter 4 Support Vector Machine)

Chapter 2 prototype)

Chapter 2 transition from supervised learning to unsupervised learning (that is, there is X with Y-> X with Y)

Chapter 2 random forest model (random forest)

Chapter 2: Cluster Learning

Chapter 4 structure chart model

Chapter 2 high-dimensional questions (the curse of dimensionality I have been talking about recently. This year's funny Nobel Prize is also related to this issue. For details, see http://www.guokr.com/article/344117)

PS. instructor Wu's comments on the random forest and other models are also very interesting. Generally, we have not figured out why the random forest works so well... in addition, this type of model is computatoinal intensive, that is, there is a very simple idea which is then implemented by a large amount of computing. In addition, these methods have more "guesses" and are unable to understand the ins and outs. In reality, they seem to be less intuitive... (unlike how economic RICS is devoted to causality ).

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.